Cellulolytic enzymes, nucleic acids encoding them and methods for making and using them

ABSTRACT

The invention is directed to polypeptides having any cellulolytic activity, e.g., a cellulase activity, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase activity, including thermostable and thermotolerant activity, and polynucleotides encoding these enzymes, and making and using these polynucleotides and polypeptides. The polypeptides of the invention can be used in a variety of pharmaceutical, agricultural, food and feed processing and industrial contexts. The invention also provides compositions or products of manufacture comprising mixtures of enzymes comprising at least one enzyme of this invention.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. application Ser.No. 13/751,923 filed Jan. 28, 2013, now issued as U.S. Pat. No.9,175,275; which is a divisional application of U.S. application Ser.No. 13/354,503 filed Jan. 20, 2012, now issued as U.S. Pat. No.9,127,263; which is a divisional application of U.S. application Ser.No. 12/278,958 filed Apr. 29, 2009, now issued as U.S. Pat. No.8,101,393; which is a 35 USC §371 National Stage application ofInternational Application No. PCT/US2006/046919 filed Dec. 8, 2006, nowexpired; which claims the benefit under 35 USC §119(e) to U.S.Application Ser. No. 60/772,786 filed Feb. 10, 2006, now expired. Thedisclosure of each of the prior applications is considered part of andis incorporated by reference in the disclosure of this application.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Contract Nos. DOE1435-04-03-CA-70224, 1435-04-04-CA-70224 and DE-FC36-03GO13146 awardedby the Department of Energy. The government has certain rights in theinvention.

REFERENCE TO SEQUENCE LISTING SUBMITTED VIA EFS-WEB

This application is being transmitted by EFS-Web, as authorized and setforth in MPEP §502.05, including a sequence listing submitted under 37C.F.R. §1.821 in ASCII text file (.txt) format. The entire content ofthe sequence listing, as identified below, is herein incorporated byreference in this application for all purposes.

File Name Date of Creation Size (bytes) BP1150-2_ST25.txt Oct. 22, 20151,800 KB

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to molecular and cellular biology andbiochemistry. In one aspect, the invention provides polypeptides havinga cellulolytic activity, e.g., a cellulase, a endoglucanase, acellobiohydrolase, a beta-glucosidase, a xylanase, a mannanse, axylosidase (e.g., a β-xylosidase), an arabinofuranosidase, and/or anoligomerase activity, polynucleotides encoding these polypeptides, andmethods of making and using these polynucleotides and polypeptides. Inone aspect, the invention provides polypeptides having an oligomeraseactivity, e.g., enzymes that convert soluble oligomers to fermentablemonomeric sugars in the saccharification of biomass, and polynucleotidesencoding these enzymes, and making and using these polynucleotides andpolypeptides. In one aspect, the invention provides thermostable andthermotolerant forms of polypeptides of the invention. The polypeptidesof the invention can be used in a variety of pharmaceutical,agricultural and industrial contexts.

2. Background Information

Cellulose is the most abundant renewable resource on earth. It iscomposed of a linear chain of β 1-4 glucose units with the repeatingunit being cellobiose, which is a glucose dimer having a structure asshown in FIG. 5. The polymer is degraded by a suite of enzymes whichinclude endoglucanases (EG) which randomly hydrolyze the cellulosepolymer, and cellobiohydrolases (CBH) which remove terminal cellobioseresidues from cellulose. Cellobiose and cello-oligosaccharides arehydrolyzed to glucose by β-glucosidases (BG). All three of these enzymesare necessary for the complete breakdown of cellulose to glucose. Foreach of these three enzymes different structural variants exist thatperform the same function. In addition, fungi and bacteria are known toproduce multiple forms of the same structural variants in addition todifferent structural variants.

Further complicating this system is the fact that some anaerobicbacteria and fungi are known to produce these enzymes in multi-enzymecomplexes which contain multiple enzymes all attached to an enzymescaffold with molecular weights above 2 million daltons. Why is such acomplex system of enzymes necessary for such a simple molecule? Someresearchers believe that this complexity is due to the recalcitrantnature of the substrate. The cellulose chains form microfibrils thatpack into a crystalline matrix via hydrogen bonding of adjacent chains.This structure is highly resistant to chemical or enzymatic degradation.

CBHs are thought to be the key enzyme in the degradation of thiscrystalline cellulose because of the nature of their enzymatic attack oncellulose. EGs unlike CBHs have an open cleft that attacks the cellulosechain at a perpendicular angle. CBHs attack the chain directly via atunnel containing the active site. The current thought is that thecellulose chains enter the tunnel and at the same time, adjacenthydrogen bonding is disrupted. Once the cellobiohydrolases haveestablished this “foothold” on the substrate, the EGs can then come inand more readily attack the substrate.

A major deficiency of known CBHs is their low catalytic activity. Somegroups argue that the low activity stems from the fact that energy fromhydrolysis is transferred to kinetic energy to disrupt hydrogen bondsand enable the enzyme to move along the substrate. CBHs are exo-actingenzymes and are found in 6 of the 90 families of glycosyl hydrolases.They include families 5, 6, 7, 9, 10 and 48. Family 5 contains manydifferent types of glycosyl hydrolases including cellulases, mannanasesand xylanases. Although most cellulases in this family areendoglucanases, there are examples of cellobiohydrolases, most notablyCelO from Clostridium thermocellum. Family 6 contains onlyendoglucanases or cellobiohydrolases with more cellobiohydrolase membersthan endoglucanases. The enzymes have an inverting mechanism andcrystallographic studies suggest that the enzyme has a distorted α/βbarrel structure containing seven, not eight parallel β-strands. Family7 enzymes are also composed of both endoglucanases andcellobiohydrolases with more cellobiohydrolases and only known membersare from fungi. The enzyme has a retaining mechanism and the crystalstructure suggests a β-jellyroll structure. Family 9 containsendoglucanases, cellobiohydrolases and β-glucosidases with apreponderance of endoglucanases. However, Thermobifida fusca produces anendo/exo-1,4-glucanase, the crystal structure of which suggests a (α/α)₆barrel fold. The enzyme has characteristics of both endo andexo-glucanases CBHs. Family 10 contains only 2 members described ascellobiohydrolases with mainly the rest described as xylanases.Cellobiohydrolases and xylanases from family 10 have activity onmethyl-umbelliferyl cellobioside. Family 48 contains mainly bacterialand anaerobic fungal cellobiohydrolases and endoglucanases. Thestructure is a (α/α)₆ barrel fold similar to family 9.

There is a need for less expensive and renewable sources of fuel forroad vehicles. New fuel sources will be more attractive if they producenonharmful endproducts after combustion. Ethanol offers an attractivealternative to petroleum based fuels and can be obtained through thefermentation of monomeric sugars derived from starch or lignocellulose.However, current economics do not support the widespread use of ethanoldue to the high cost of generating it. One area of research aimed atdecreasing costs is enhancement of the technical efficacy of the enzymesthat can be used to generate fermentable sugars from biomass, e.g.,lignocellulose-comprising compositions. The development of enzymes thatmore efficiently digest biomass, e.g., feedstocks, will translate todecreased ethanol production costs. More efficient processes willdecrease the United State's reliance on foreign oil and the pricefluctuations that may be related to that reliance. Using cleaner fuelsfor transportation like bioethanol also may decrease net CO₂ emissionsthat are believed to be partially responsible for global warming.

Due to the complexity of biomass, its conversion to monomer sugarsinvolves the action of several different enzyme classes, as illustratedin FIGS. 6, 7, 8, 62 and 63, which includes a schematic of the enzymesinvolved in digestion of cellulose (FIGS. 6, 7 and 63) and hemicellulose(FIGS. 8 and 62). Biomass is composed of both carbohydrate andnon-carbohydrate materials. The carbohydrates can be sub-divided intocellulose, a linear polymer of β-1,4 linked glucose moieties, andhemicellulose, a complex branched polymer consisting of a main chain ofβ-1,4 linked xylose with branches of arabinose, galactose, mannose andglucuronic acids. On occasion the xylose may be acetylated and arabinosemay contain ferulic or cinnamic acid esters to other hemicellulosechains or to lignin. The last major constituent of biomass is lignin, ahighly crosslinked phenylpropanoid structure. Cellulases convertcellulose to glucose and are composed of: (1) endoglucanases, cleavinginternal β-1,4 glycosidic linkages resulting in shorter chainglucooligosaccharides, (2) cellobiohydrolases, acting on the ends of thesmaller oligosaccharides resulting in cellobiose (disaccharide), and (3)β-glucosidase, converting the soluble oligosaccharides (DP2 to DP7) toglucose. Single component enzymes have been shown to only partiallydigest cellulose and the concerted action of all enzymes is required forcomplete conversion to glucose. Many more enzymes are required to digesthemicellulose to sugar monomers including xylanase, xylosidase,arabinofuranosidase, mannanase, galactosidase and glucuronidase.Non-glycosyl hydrolases such as acetyl xylan esterase and ferulic acidesterase may also be involved.

SUMMARY OF THE INVENTION

The invention provides polypeptides having cellulolytic activity, e.g.,cellulases activity, such as endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, xylosidase (e.g.,β-xylosidase), arabinofuranosidase, and/or oligomerase activity, andnucleic acids encoding them, and methods for making and using them. Inone aspect, the enzymes of the invention have an increased catalyticrate to improve the process of substrate (e.g., cellulose) hydrolysis.This increased efficiency in catalytic rate leads to an increasedefficiency in producing sugars, which can be useful in industrialapplications, e.g., the sugars so produced can be used by microorganismsfor ethanol production. In one aspect, the invention provides highlyactive (e.g., having an increased catalytic rate) endoglucanases,cellobiohydrolases, β-glucosidases (beta-glucosidases), xylanases,xylosidase (e.g., β-xylosidase), arabinofuranosidases, and/oroligomerases. The invention provides industrial applications (e.g.,biomass to ethanol) using enzymes of the invention having decreasedenzyme costs, e.g., decreased costs in biomass to ethanol conversionprocesses. Thus, the invention provides efficient processes forproducing bioethanol and bioethanol-comprising compositions, includingfuels comprising bioethanol, from any biomass.

In one aspect, enzymes of the invention, including the enzyme“cocktails” of the invention (“cocktails” meaning mixtures of enzymescomprising at least one enzyme of this invention), are used to hydrolyzethe major components of a lignocellulosic biomass, or any compositioncomprising cellulose and/or hemicellulose (lignocellulosic biomass alsocomprises lignin), e.g., seeds, grains, tubers, plant waste orbyproducts of food processing or industrial processing (e.g., stalks),corn (including cobs, stover, and the like), grasses (e.g., Indiangrass, such as Sorghastrum nutans; or, switch grass, e.g., Panicumspecies, such as Panicum virgatum), wood (including wood chips,processing waste), paper, pulp, recycled paper (e.g., newspaper). In oneaspect, enzymes of the invention are used to hydrolyze cellulosecomprising a linear chain of β-1,4-linked glucose moieties, and/orhemicellulose as a complex structure that varies from plant to plant. Inone aspect, enzymes of the invention are used to hydrolyzehemicelluloses containing a backbone of β-1,4 linked xylose moleculeswith intermittent branches of arabinose, galactose, glucuronic acidand/or mannose. In one aspect, enzymes of the invention are used tohydrolyze hemicellulose containing non-carbohydrate constituents such asacetyl groups on xylose and ferulic acid esters on arabinose. In oneaspect, enzymes of the invention are used to hydrolyze hemicellulosescovalently linked to lignin and/or coupled to other hemicellulosestrands via diferulate crosslinks.

In one aspect, the compositions and methods of the invention are used inthe enzymatic digestion of biomass and can comprise use of manydifferent enzymes, including the cellulases and hemicellulases.Cellulases used to practice the invention can digest cellulose toglucose. In one aspect, compositions used to practice the invention caninclude mixtures of enzymes, e.g., xylanases, xylosidases (e.g.,β-xylosidases), cellobiohydrolases, and/or arabinofuranosidases or otherenzymes that can digest hemicellulose to monomer sugars.

In one aspect, compositions used to practice the invention include a“cellulase” that is a mixture of at least three different enzyme types,(1) endoglucanase, which cleaves internal β-1,4 linkages resulting inshorter glucooligosaccharides, (2) cellobiohydrolase, which acts in an“exo” manner processively releasing cellobiose units (β-1,4glucose-glucose disaccharide), and (3) β-glucosidase, releasing glucosemonomer from short cellooligosaccharides (e.g., cellobiose).

In one aspect, the enzymes of the invention have a glucanase, e.g., anendoglucanase, activity, e.g., catalyzing hydrolysis of internalendo-β-1,4- and/or β-1,3-glucanase linkages. In one aspect, theendoglucanase activity (e.g., endo-1,4-beta-D-glucan 4-glucano hydrolaseactivity) comprises hydrolysis of 1,4- and/or β-1,3-beta-D-glycosidiclinkages in cellulose, cellulose derivatives (e.g., carboxy methylcellulose and hydroxy ethyl cellulose) lichenin, beta-1,4 bonds in mixedbeta-1,3 glucans, such as cereal beta-D-glucans or xyloglucans and otherplant material containing cellulosic parts.

In one aspect, the enzymes of the invention have endoglucanase (e.g.,endo-beta-1,4-glucanases, EC 3.2.1.4; endo-beta-1,3(1)-glucanases, EC3.2.1.6; endo-beta-1,3-glucanases, EC 3.2.1.39) activity and canhydrolyze internal β-1,4- and/or β-1,3-glucosidic linkages in celluloseand glucan to produce smaller molecular weight glucose and glucoseoligomers. The invention provides methods for producing smallermolecular weight glucose and glucose oligomers using these enzymes ofthe invention.

In one aspect, the enzymes of the invention are used to generateglucans, e.g., polysaccharides formed from 1,4-β- and/or1,3-glycoside-linked D-glucopyranose. In one aspect, the endoglucanasesof the invention are used in the food industry, e.g., for baking andfruit and vegetable processing, breakdown of agricultural waste, in themanufacture of animal feed, in pulp and paper production, textilemanufacture and household and industrial cleaning agents. In one aspect,the enzymes, e.g., endoglucanases, of the invention are produced by amicroorganism, e.g., by a fungi and/or a bacteria.

In one aspect, the enzymes, e.g., endoglucanases, of the invention areused to hydrolyze beta-glucans (β-glucans) which are major non-starchpolysaccharides of cereals. The glucan content of a polysaccharide canvary significantly depending on variety and growth conditions. Thephysicochemical properties of this polysaccharide are such that it givesrise to viscous solutions or even gels under oxidative conditions. Inaddition glucans have high water-binding capacity. All of thesecharacteristics present problems for several industries includingbrewing, baking, animal nutrition. In brewing applications, the presenceof glucan results in wort filterability and haze formation issues. Inbaking applications (especially for cookies and crackers), glucans cancreate sticky doughs that are difficult to machine and reduce biscuitsize. Thus, the enzymes, e.g., endoglucanases, of the invention are usedto decrease the amount of β-glucan in a β-glucan-comprising composition,e.g., enzymes of the invention are used in processes to decrease theviscosity of solutions or gels; to decrease the water-binding capacityof a composition, e.g., a β-glucan-comprising composition; in brewingprocesses (e.g., to increase wort filterability and decrease hazeformation), to decrease the stickiness of doughs, e.g., those for makingcookies, breads, biscuits and the like.

In addition, carbohydrates (e.g., β-glucan) are implicated in rapidrehydration of baked products resulting in loss of crispiness andreduced shelf-life. Thus, the enzymes, e.g., endoglucanases, of theinvention are used to retain crispiness, increase crispiness, or reducethe rate of loss of crispiness, and to increase the shelf-life of anycarbohydrate-comprising food, feed or drink, e.g., a β-glucan-comprisingfood, feed or drink.

Enzymes, e.g., endoglucanases, of the invention are used to decrease theviscosity of gut contents (e.g., in animals, such as ruminant animals,or humans), e.g., those with cereal diets. Thus, in alternative aspects,enzymes, e.g., endoglucanases, of the invention are used to positivelyaffect the digestibility of a food or feed and animal (e.g., human ordomestic animal) growth rate, and in one aspect, are used to highergenerate feed conversion efficiencies. For monogastric animal feedapplications with cereal diets, beta-glucan is a contributing factor toviscosity of gut contents and thereby adversely affects thedigestibility of the feed and animal growth rate. For ruminant animals,these beta-glucans represent substantial components of fiber intake andmore complete digestion of glucans would facilitate higher feedconversion efficiencies. Accordingly, the invention provides animalfeeds and foods comprising endoglucanases of the invention, and in oneaspect, these enzymes are active in an animal digestive tract, e.g., ina stomach and/or intestine.

Enzymes, e.g., endoglucanases, of the invention are used to digestcellulose or any beta-1,4-linked glucan-comprising synthetic or naturalmaterial, including those found in any plant material. Enzymes, e.g.,endoglucanases, of the invention are used as commercial enzymes todigest cellulose from any source, including all biological sources, suchas plant biomasses, e.g., corn, grains, grasses (e.g., Indian grass,such as Sorghastrum nutans; or, switch grass, e.g., Panicum species,such as Panicum virgatum), or, woods or wood processing byproducts,e.g., in the wood processing, pulp and/or paper industry, in textilemanufacture and in household and industrial cleaning agents, and/or inbiomass waste processing.

In one aspect the invention provides compositions (e.g., pharmaceuticalcompositions, foods, feeds, drugs, dietary supplements) comprising theenzymes, polypeptides or polynucleotides of the invention. Thesecompositions can be formulated in a variety of forms, e.g., as tablets,gels, pills, implants, liquids, sprays, powders, food, feed pellets oras any type of encapsulated form.

The invention provides isolated, synthetic or recombinant nucleic acidscomprising a nucleic acid sequence having at least about 50%, 51%, 52%,53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%,67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%,81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identityto an exemplary nucleic acid of the invention, including SEQ ID NO:1,SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ IDNO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ IDNO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ IDNO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ IDNO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ IDNO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ IDNO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ IDNO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID NO:81, SEQ IDNO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID NO:91, SEQ IDNO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ ID NO:101, SEQ IDNO:103, SEQ ID NO:105, SEQ ID NO:107, SEQ ID NO:109, SEQ ID NO:111, SEQID NO:113, SEQ ID NO:115, SEQ ID NO:117, SEQ ID NO:119, SEQ ID NO:121,SEQ ID NO:123, SEQ ID NO:125, SEQ ID NO:127, SEQ ID NO:129, SEQ IDNO:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:137, SEQ ID NO:139, SEQID NO:141, SEQ ID NO:143, SEQ ID NO:145, SEQ ID NO:147, SEQ ID NO:149,SEQ ID NO:151, SEQ ID NO:153, SEQ ID NO:155, SEQ ID NO:157, SEQ IDNO:159, SEQ ID NO:161, SEQ ID NO:163, SEQ ID NO:165, SEQ ID NO:167, SEQID NO:169, SEQ ID NO:171, SEQ ID NO:173, SEQ ID NO:175, SEQ ID NO:177,SEQ ID NO:179, SEQ ID NO:181, SEQ ID NO:183, SEQ ID NO:185, SEQ IDNO:187, SEQ ID NO:189, SEQ ID NO:191, SEQ ID NO:193, SEQ ID NO:195, SEQID NO:197, SEQ ID NO:199, SEQ ID NO:201, SEQ ID NO:203, SEQ ID NO:205,SEQ ID NO:207, SEQ ID NO:209, SEQ ID NO:211, SEQ ID NO:213, SEQ IDNO:215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID NO:221, SEQ ID NO:223, SEQID NO:225, SEQ ID NO:227, SEQ ID NO:229, SEQ ID NO:231, SEQ ID NO:233,SEQ ID NO:235, SEQ ID NO:237, SEQ ID NO:239, SEQ ID NO:241, SEQ IDNO:243, SEQ ID NO:245, SEQ ID NO:247, SEQ ID NO:249, SEQ ID NO:251, SEQID NO:253, SEQ ID NO:255, SEQ ID NO:257, SEQ ID NO:259, SEQ ID NO:261,SEQ ID NO:263, SEQ ID NO:265, SEQ ID NO:267, SEQ ID NO:269, SEQ IDNO:271, SEQ ID NO:273, SEQ ID NO:275, SEQ ID NO:277, SEQ ID NO:279, SEQID NO:281, SEQ ID NO:283, SEQ ID NO:285, SEQ ID NO:287, SEQ ID NO:289,SEQ ID NO:291, SEQ ID NO:293, SEQ ID NO:295, SEQ ID NO:297, SEQ IDNO:299, SEQ ID NO:301, SEQ ID NO:303, SEQ ID NO:305, SEQ ID NO:307, SEQID NO:309, SEQ ID NO:311, SEQ ID NO:313, SEQ ID NO:315, SEQ ID NO:317,SEQ ID NO:319, SEQ ID NO:321, SEQ ID NO:323, SEQ ID NO:325, SEQ IDNO:327, SEQ ID NO:329, SEQ ID NO:331, SEQ ID NO:333, SEQ ID NO:335, SEQID NO:337, SEQ ID NO:339, SEQ ID NO:341, SEQ ID NO:343, SEQ ID NO:345,SEQ ID NO:347, SEQ ID NO:349, SEQ ID NO:351, SEQ ID NO:353, SEQ IDNO:355, SEQ ID NO:357, SEQ ID NO:359, SEQ ID NO:361; SEQ ID NO:363, SEQID NO:365, SEQ ID NO:367, SEQ ID NO:369, SEQ ID NO:371, SEQ ID NO:373,SEQ ID NO:375, SEQ ID NO:377, SEQ ID NO:379, SEQ ID NO:381, SEQ IDNO:383, SEQ ID NO:385, SEQ ID NO:387, SEQ ID NO:389, SEQ ID NO:391, SEQID NO:393, SEQ ID NO:395, SEQ ID NO:397, SEQ ID NO:399, SEQ ID NO:401,SEQ ID NO:403, SEQ ID NO:405, SEQ ID NO:407, SEQ ID NO:409, SEQ IDNO:411, SEQ ID NO:413, SEQ ID NO:415, SEQ ID NO:417, SEQ ID NO:419, SEQID NO:421, SEQ ID NO:423, SEQ ID NO:425, SEQ ID NO:427, SEQ ID NO:429,SEQ ID NO:431, SEQ ID NO:433, SEQ ID NO:435, SEQ ID NO:437, SEQ IDNO:439, SEQ ID NO:441, SEQ ID NO:443, SEQ ID NO:445, SEQ ID NO:447, SEQID NO:449, SEQ ID NO:451, SEQ ID NO:453, SEQ ID NO:455, SEQ ID NO:457,SEQ ID NO:459, SEQ ID NO:461, SEQ ID NO:463, SEQ ID NO:465, SEQ IDNO:467, SEQ ID NO:469, SEQ ID NO:471, SEQ ID NO:473, SEQ ID NO:475, SEQID NO:477, SEQ ID NO:479, SEQ ID NO:481, SEQ ID NO:483, SEQ ID NO:485,SEQ ID NO:487, SEQ ID NO:489, SEQ ID NO:491, SEQ ID NO:493, SEQ IDNO:495, SEQ ID NO:497, SEQ ID NO:499, SEQ ID NO:501, SEQ ID NO:503, SEQID NO:505, SEQ ID NO:507, SEQ ID NO:509, SEQ ID NO:511, SEQ ID NO:513,SEQ ID NO:515, SEQ ID NO:517, SEQ ID NO:519, SEQ ID NO:521 and/or SEQ IDNO:523; see also Tables 1, 2, and 3, Examples 1 and 4, below, andSequence Listing, over a region of at least about 10, 15, 20, 25, 30,35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550,600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200,1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800,1850, 1900, 1950, 2000, 2050, 2100, 2200, 2250, 2300, 2350, 2400, 2450,2500, or more residues; and in alternative aspects, these nucleic acidsencode at least one polypeptide having a cellulolytic activity, e.g., acellulase activity, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, xylosidase (e.g.,β-xylosidase), arabinofuranosidase, and/or oligomerase activity. Anoligomerase can, e.g., can hydrolyze (degrade) solublecellooligosaccharides and arabinoxylan oligomers into monomer xylose,arabinose and glucose, or encode a polypeptide capable of generating anantibody that can specifically bind to a polypeptide of the invention,or, these nucleic acids can be used as probes for identifying orisolating cellulase-encoding nucleic acids, or to inhibit the expressionof cellulase-expressing nucleic acids (all these aspects referred to asthe “nucleic acids of the invention”). In one aspect, the sequenceidentities are determined by analysis with a sequence comparisonalgorithm or by a visual inspection.

Nucleic acids of the invention also include isolated, synthetic orrecombinant nucleic acids encoding an exemplary enzyme of the invention,including a polypeptide having the sequence of (a sequence as set forthin) SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10,SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20,SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30,SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40,SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50,SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60,SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70,SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:80,SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID NO:90,SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID NO:100,SEQ ID NO:102, SEQ ID NO:104, SEQ ID NO:106, SEQ ID NO:108, SEQ IDNO:110, SEQ ID NO:112, SEQ ID NO:114, SEQ ID NO:116, SEQ ID NO:118, SEQID NO:120, SEQ ID NO:122, SEQ ID NO:124, SEQ ID NO:126, SEQ ID NO:128,SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:136, SEQ IDNO:138, SEQ ID NO:140, SEQ ID NO:142, SEQ ID NO:143, SEQ ID NO:146, SEQID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ ID NO:154, SEQ ID NO:156,SEQ ID NO:158, SEQ ID NO:160, SEQ ID NO:162, SEQ ID NO:164, SEQ IDNO:166, SEQ ID NO:168, SEQ ID NO:170, SEQ ID NO:172, SEQ ID NO:174, SEQID NO:176, SEQ ID NO:178, SEQ ID NO:180, SEQ ID NO:182, SEQ ID NO:184,SEQ ID NO:186, SEQ ID NO:188, SEQ ID NO:190, SEQ ID NO:192, SEQ IDNO:194, SEQ ID NO:196, SEQ ID NO:198, SEQ ID NO:200, SEQ ID NO:202, SEQID NO:204, SEQ ID NO:206, SEQ ID NO:209, SEQ ID NO:210, SEQ ID NO:212,SEQ ID NO:214, SEQ ID NO:216, SEQ ID NO:218, SEQ ID NO:220, SEQ IDNO:222, SEQ ID NO:224, SEQ ID NO:226, SEQ ID NO:228, SEQ ID NO:230, SEQID NO:232, SEQ ID NO:234, SEQ ID NO:236, SEQ ID NO:238, SEQ ID NO:240,SEQ ID NO:242, SEQ ID NO:244, SEQ ID NO:246, SEQ ID NO:248, SEQ IDNO:250, SEQ ID NO:252, SEQ ID NO:254, SEQ ID NO:256, SEQ ID NO:258, SEQID NO:260, SEQ ID NO:262, SEQ ID NO:264, SEQ ID NO:266, SEQ ID NO:268,SEQ ID NO:270, SEQ ID NO:272, SEQ ID NO:274, SEQ ID NO:276, SEQ IDNO:278, SEQ ID NO:280, SEQ ID NO:282, SEQ ID NO:284, SEQ ID NO:286, SEQID NO:288, SEQ ID NO:290, SEQ ID NO:292, SEQ ID NO:294, SEQ ID NO:296,SEQ ID NO:298, SEQ ID NO:300, SEQ ID NO:302, SEQ ID NO:304, SEQ IDNO:306, SEQ ID NO:308, SEQ ID NO:310, SEQ ID NO:312, SEQ ID NO:314, SEQID NO:316, SEQ ID NO:318, SEQ ID NO:320, SEQ ID NO:322, SEQ ID NO:324,SEQ ID NO:326, SEQ ID NO:328, SEQ ID NO:330, SEQ ID NO:332, SEQ IDNO:334, SEQ ID NO:336, SEQ ID NO:338, SEQ ID NO:340, SEQ ID NO:342, SEQID NO:344, SEQ ID NO:346, SEQ ID NO:348, SEQ ID NO:350, SEQ ID NO:352,SEQ ID NO:354, SEQ ID NO:356, SEQ ID NO:358, SEQ ID NO:360, SEQ IDNO:362, SEQ ID NO:364, SEQ ID NO:366, SEQ ID NO:368, SEQ ID NO:370, SEQID NO:372, SEQ ID NO:374, SEQ ID NO:376, SEQ ID NO:378, SEQ ID NO:380,SEQ ID NO:382, SEQ ID NO:384, SEQ ID NO:386, SEQ ID NO:388, SEQ IDNO:390, SEQ ID NO:392, SEQ ID NO:394, SEQ ID NO:396, SEQ ID NO:398, SEQID NO:400, SEQ ID NO:402, SEQ ID NO:404, SEQ ID NO:406, SEQ ID NO:408,SEQ ID NO:410, SEQ ID NO:412, SEQ ID NO:414, SEQ ID NO:416, SEQ IDNO:418, SEQ ID NO:420, SEQ ID NO:422, SEQ ID NO:424, SEQ ID NO:426, SEQID NO:428, SEQ ID NO:430, SEQ ID NO:432, SEQ ID NO:434, SEQ ID NO:436,SEQ ID NO:438, SEQ ID NO:440, SEQ ID NO:442, SEQ ID NO:444, SEQ IDNO:446, SEQ ID NO:448, SEQ ID NO:450, SEQ ID NO:452, SEQ ID NO:454, SEQID NO:456, SEQ ID NO:458, SEQ ID NO:460, SEQ ID NO:462, SEQ ID NO:464,SEQ ID NO:466, SEQ ID NO:468, SEQ ID NO:470, SEQ ID NO:472, SEQ IDNO:474, SEQ ID NO:476, SEQ ID NO:478, SEQ ID NO:480, SEQ ID NO:482, SEQID NO:484, SEQ ID NO:486, SEQ ID NO:488, SEQ ID NO:490, SEQ ID NO:492,SEQ ID NO:494, SEQ ID NO:496, SEQ ID NO:498, SEQ ID NO:500, SEQ IDNO:502, SEQ ID NO:504, SEQ ID NO:506, SEQ ID NO:508, SEQ ID NO:510, SEQID NO:512, SEQ ID NO:514, SEQ ID NO:516, SEQ ID NO:518, SEQ ID NO:520,SEQ ID NO:522 and/or SEQ ID NO:524 see also Tables 1, 2, and 3, Examples1 and 4, below, and the Sequence Listing, and subsequences thereof andvariants thereof. In one aspect, the polypeptide has a cellulolyticactivity, e.g., a cellulase activity, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase,xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/oroligomerase. An oligomerase can, e.g., can hydrolyze (degrade) solublecellooligsaccharides and arabinoxylan oligomers into monomer xylose,arabinose and glucose.

In one aspect, the invention provides nucleic acids encodingcellulolytic enzymes, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, xylosidase (e.g.,β-xylosidase), arabinofuranosidase, and/or oligomerase-encoding nucleicacids having a common novelty in that they are derived from mixedcultures. The invention provides cellulose or oligosaccharidehydrolyzing (degrading) enzyme-encoding nucleic acids isolated frommixed cultures comprising a polynucleotide of the invention, e.g., asequence having at least about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%,50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%,64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%,78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%,92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%)sequence identity to an exemplary nucleic acid of the invention, e.g.,SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ IDNO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ IDNO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ IDNO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ IDNO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ IDNO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ IDNO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ IDNO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ IDNO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ IDNO:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ IDNO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ ID NO:107, SEQ ID NO:109, SEQID NO:111, SEQ ID NO:113, SEQ ID NO:115, SEQ ID NO:117, SEQ ID NO:119,SEQ ID NO:121, SEQ ID NO:123, SEQ ID NO:125, SEQ ID NO:127, SEQ IDNO:129, SEQ ID NO:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:137, SEQID NO:139, SEQ ID NO:141, SEQ ID NO:143, SEQ ID NO:145, SEQ ID NO:147,SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153, SEQ ID NO:155, SEQ IDNO:157, SEQ ID NO:159, SEQ ID NO:161, SEQ ID NO:163, SEQ ID NO:165, SEQID NO:167, SEQ ID NO:169, SEQ ID NO:171, SEQ ID NO:173, SEQ ID NO:175,SEQ ID NO:177, SEQ ID NO:179, SEQ ID NO:181, SEQ ID NO:183, SEQ IDNO:185, SEQ ID NO:187, SEQ ID NO:189, SEQ ID NO:191, SEQ ID NO:193, SEQID NO:195, SEQ ID NO:197, SEQ ID NO:199, SEQ ID NO:201, SEQ ID NO:203,SEQ ID NO:205, SEQ ID NO:207, SEQ ID NO:209, SEQ ID NO:211, SEQ IDNO:213, SEQ ID NO:215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID NO:221, SEQID NO:223, SEQ ID NO:225, SEQ ID NO:227, SEQ ID NO:229, SEQ ID NO:231,SEQ ID NO:233, SEQ ID NO:235, SEQ ID NO:237, SEQ ID NO:239, SEQ IDNO:241, SEQ ID NO:243, SEQ ID NO:245, SEQ ID NO:247, SEQ ID NO:249, SEQID NO:251, SEQ ID NO:253, SEQ ID NO:255, SEQ ID NO:257, SEQ ID NO:259,SEQ ID NO:261, SEQ ID NO:263, SEQ ID NO:265, SEQ ID NO:267, SEQ IDNO:269, SEQ ID NO:271, SEQ ID NO:273, SEQ ID NO:275, SEQ ID NO:277, SEQID NO:279, SEQ ID NO:281, SEQ ID NO:283, SEQ ID NO:285, SEQ ID NO:287,SEQ ID NO:289, SEQ ID NO:291, SEQ ID NO:293, SEQ ID NO:295, SEQ IDNO:297, SEQ ID NO:299, SEQ ID NO:301, SEQ ID NO:303, SEQ ID NO:305, SEQID NO:307, SEQ ID NO:309, SEQ ID NO:311, SEQ ID NO:313, SEQ ID NO:315,SEQ ID NO:317, SEQ ID NO:319, SEQ ID NO:321, SEQ ID NO:323, SEQ IDNO:325, SEQ ID NO:327, SEQ ID NO:329, SEQ ID NO:331, SEQ ID NO:333, SEQID NO:335, SEQ ID NO:337, SEQ ID NO:339, SEQ ID NO:341, SEQ ID NO:343,SEQ ID NO:345, SEQ ID NO:347, SEQ ID NO:349, SEQ ID NO:351, SEQ IDNO:353, SEQ ID NO:355, SEQ ID NO:357, SEQ ID NO:359, SEQ ID NO:361, SEQID NO:363, SEQ ID NO:365, SEQ ID NO:367, SEQ ID NO:369, SEQ ID NO:371,SEQ ID NO:373, SEQ ID NO:375, SEQ ID NO:377, SEQ ID NO:379, SEQ IDNO:381, SEQ ID NO:383, SEQ ID NO:385, SEQ ID NO:387, SEQ ID NO:389, SEQID NO:391, SEQ ID NO:393, SEQ ID NO:395, SEQ ID NO:397, SEQ ID NO:399,SEQ ID NO:401, SEQ ID NO:403, SEQ ID NO:405, SEQ ID NO:407, SEQ IDNO:409, SEQ ID NO:411, SEQ ID NO:413, SEQ ID NO:415, SEQ ID NO:417, SEQID NO:419, SEQ ID NO:421, SEQ ID NO:423, SEQ ID NO:425, SEQ ID NO:427,SEQ ID NO:429, SEQ ID NO:431, SEQ ID NO:433, SEQ ID NO:435, SEQ IDNO:437, SEQ ID NO:439, SEQ ID NO:441, SEQ ID NO:443, SEQ ID NO:445, SEQID NO:447, SEQ ID NO:449, SEQ ID NO:451, SEQ ID NO:453, SEQ ID NO:455,SEQ ID NO:457, SEQ ID NO:459, SEQ ID NO:461, SEQ ID NO:463, SEQ IDNO:465, SEQ ID NO:467, SEQ ID NO:469, SEQ ID NO:471, SEQ ID NO:473, SEQID NO:475, SEQ ID NO:477, SEQ ID NO:479, SEQ ID NO:481, SEQ ID NO:483,SEQ ID NO:485, SEQ ID NO:487, SEQ ID NO:489, SEQ ID NO:491, SEQ IDNO:493, SEQ ID NO:495, SEQ ID NO:497, SEQ ID NO:499, SEQ ID NO:501, SEQID NO:503, SEQ ID NO:505, SEQ ID NO:507, SEQ ID NO:509, SEQ ID NO:511,SEQ ID NO:513, SEQ ID NO:515, SEQ ID NO:517, SEQ ID NO:519, SEQ IDNO:521 and/or SEQ ID NO:523 and see Tables 1, 2, and 3, Examples 1 and4, below, and Sequence Listing, over a region of at least about 50, 75,100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750,800, 850, 900, 950, 1000, 1050, 1100, 1150, or more.

In one aspect, the invention provides nucleic acids encodingcellulolytic enzymes, e.g., endoglucanase enzyme, cellobiohydrolaseenzyme, β-glucosidase enzyme (beta-glucosidase enzyme), xylanase enzyme,xylosidase (e.g., β-xylosidase) enzyme, arabinofuranosidase enzyme,and/or oligomerase enzyme-encoding nucleic acids, including exemplarypolynucleotide sequences of the invention, see also Tables 1, 2, and 3,Examples 1 and 4, below, and Sequence Listing, and the polypeptidesencoded by them, including enzymes of the invention, e.g., exemplarypolypeptides of the invention, e.g., SEQ ID NO:2, SEQ ID NO:4, SEQ IDNO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ IDNO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ IDNO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ IDNO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ IDNO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ IDNO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ IDNO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ IDNO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ IDNO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ IDNO:96, SEQ ID NO:98, SEQ ID NO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ IDNO:106, SEQ ID NO:108, SEQ ID NO:110, SEQ ID NO:112, SEQ ID NO:114, SEQID NO:116, SEQ ID NO:118, SEQ ID NO:120, SEQ ID NO:122, SEQ ID NO:124,SEQ ID NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ IDNO:134, SEQ ID NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQID NO:143, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152,SEQ ID NO:154, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:160, SEQ IDNO:162, SEQ ID NO:164, SEQ ID NO:166, SEQ ID NO:168, SEQ ID NO:170, SEQID NO:172, SEQ ID NO:174, SEQ ID NO:176, SEQ ID NO:178, SEQ ID NO:180,SEQ ID NO:182, SEQ ID NO:184, SEQ ID NO:186, SEQ ID NO:188, SEQ IDNO:190, SEQ ID NO:192, SEQ ID NO:194, SEQ ID NO:196, SEQ ID NO:198, SEQID NO:200, SEQ ID NO:202, SEQ ID NO:204, SEQ ID NO:206, SEQ ID NO:209,SEQ ID NO:210, SEQ ID NO:212, SEQ ID NO:214, SEQ ID NO:216, SEQ IDNO:218, SEQ ID NO:220, SEQ ID NO:222, SEQ ID NO:224, SEQ ID NO:226, SEQID NO:228, SEQ ID NO:230, SEQ ID NO:232, SEQ ID NO:234, SEQ ID NO:236,SEQ ID NO:238, SEQ ID NO:240, SEQ ID NO:242, SEQ ID NO:244, SEQ IDNO:246, SEQ ID NO:248, SEQ ID NO:250, SEQ ID NO:252, SEQ ID NO:254, SEQID NO:256, SEQ ID NO:258, SEQ ID NO:260, SEQ ID NO:262, SEQ ID NO:264,SEQ ID NO:266, SEQ ID NO:268, SEQ ID NO:270, SEQ ID NO:272, SEQ IDNO:274, SEQ ID NO:276, SEQ ID NO:278, SEQ ID NO:280, SEQ ID NO:282, SEQID NO:284, SEQ ID NO:286, SEQ ID NO:288, SEQ ID NO:290, SEQ ID NO:292,SEQ ID NO:294, SEQ ID NO:296, SEQ ID NO:298, SEQ ID NO:300, SEQ IDNO:302, SEQ ID NO:304, SEQ ID NO:306, SEQ ID NO:308, SEQ ID NO:310, SEQID NO:312, SEQ ID NO:314, SEQ ID NO:316, SEQ ID NO:318, SEQ ID NO:320,SEQ ID NO:322, SEQ ID NO:324, SEQ ID NO:326, SEQ ID NO:328, SEQ IDNO:330, SEQ ID NO:332, SEQ ID NO:334, SEQ ID NO:336, SEQ ID NO:338, SEQID NO:340, SEQ ID NO:342, SEQ ID NO:344, SEQ ID NO:346, SEQ ID NO:348,SEQ ID NO:350, SEQ ID NO:352, SEQ ID NO:354, SEQ ID NO:356, SEQ IDNO:358, SEQ ID NO:360, SEQ ID NO:362, SEQ ID NO:364, SEQ ID NO:366, SEQID NO:368, SEQ ID NO:370, SEQ ID NO:372, SEQ ID NO:374, SEQ ID NO:376,SEQ ID NO:378, SEQ ID NO:380, SEQ ID NO:382, SEQ ID NO:384, SEQ IDNO:386, SEQ ID NO:388, SEQ ID NO:390, SEQ ID NO:392, SEQ ID NO:394, SEQID NO:396, SEQ ID NO:398, SEQ ID NO:400, SEQ ID NO:402, SEQ ID NO:404,SEQ ID NO:406, SEQ ID NO:408, SEQ ID NO:410, SEQ ID NO:412, SEQ IDNO:414, SEQ ID NO:416, SEQ ID NO:418, SEQ ID NO:420, SEQ ID NO:422, SEQID NO:424, SEQ ID NO:426, SEQ ID NO:428, SEQ ID NO:430, SEQ ID NO:432,SEQ ID NO:434, SEQ ID NO:436, SEQ ID NO:438, SEQ ID NO:440, SEQ IDNO:442, SEQ ID NO:444, SEQ ID NO:446, SEQ ID NO:448, SEQ ID NO:450, SEQID NO:452, SEQ ID NO:454, SEQ ID NO:456, SEQ ID NO:458, SEQ ID NO:460,SEQ ID NO:462, SEQ ID NO:464, SEQ ID NO:466, SEQ ID NO:468, SEQ IDNO:470, SEQ ID NO:472, SEQ ID NO:474, SEQ ID NO:476, SEQ ID NO:478, SEQID NO:480, SEQ ID NO:482, SEQ ID NO:484, SEQ ID NO:486, SEQ ID NO:488,SEQ ID NO:490, SEQ ID NO:492, SEQ ID NO:494, SEQ ID NO:496, SEQ IDNO:498, SEQ ID NO:500, SEQ ID NO:502, SEQ ID NO:504, SEQ ID NO:506, SEQID NO:508, SEQ ID NO:510, SEQ ID NO:512, SEQ ID NO:514, SEQ ID NO:516,SEQ ID NO:518, SEQ ID NO:520, SEQ ID NO:522 and/or SEQ ID NO:524 seealso Table 1 and Sequence Listing, having a common novelty in that theyare derived from a common source, e.g., an environmental source. Table3, below, indicates the source of each enzyme of the invention. In oneaspect, the invention also provides cellulase enzyme-, e.g.,endoglucanase enzyme, cellobiohydrolases enzyme, β-glucosidase enzyme(beta-glucosidase enzyme), xylanase enzyme, xylosidase (e.g.,β-xylosidase), arabinofuranosidase enzyme, and/or oligomeraseenzyme-encoding nucleic acids with a common novelty in that they arederived from environmental sources, e.g., mixed environmental sources.

In one aspect, the sequence comparison algorithm is a BLAST version2.2.2 algorithm where a filtering setting is set to blastall-p blastp-d“nr pataa”-F F, and all other options are set to default.

Another aspect of the invention is an isolated, synthetic or recombinantnucleic acid including at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 75,100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750,800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350,1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950,2000, 2050, 2100, 2200, 2250, 2300, 2350, 2400, 2450, 2500, or moreconsecutive bases of a nucleic acid sequence of the invention, sequencessubstantially identical thereto, and the sequences complementarythereto.

In one aspect, the isolated, synthetic or recombinant nucleic acids ofthe invention encode a polypeptide having a cellulolytic activity, e.g.,a cellulase activity, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, xylosidase (e.g.,β-xylosidase), arabinofuranosidase, and/or oligomerase activity, or anoligomerase activity, e.g., can hydrolyze (degrade) solublecellooligsaccharides and arabinoxylan oligomers into monomer xylose,arabinose and glucose, which is thermostable. The polypeptide can retaina cellulase or an oligomerase activity under conditions comprising atemperature range of between about 37° C. to about 95° C.; between about55° C. to about 85° C., between about 70° C. to about 95° C., or,between about 90° C. to about 95° C. The polypeptide can retain acellulase or an oligomerase activity in temperatures in the rangebetween about 1° C. to about 5° C., between about 5° C. to about 15° C.,between about 15° C. to about 25° C., between about 25° C. to about 37°C., between about 37° C. to about 95° C., 96° C., 97° C., 98° C. or 99°C., between about 55° C. to about 85° C., between about 70° C. to about75° C., or between about 90° C. to about 99° C., or 95° C., 96° C., 97°C., 98° C. or 99° C., or more.

In another aspect, the isolated, synthetic or recombinant nucleic acidencodes a polypeptide having a cellulolytic activity, e.g., a cellulaseactivity, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase activity, e.g., can hydrolyze(degrade) soluble cellooligsaccharides and arabinoxylan oligomers intomonomer xylose, arabinose and glucose, which is thermotolerant. Thepolypeptide can retain a cellulase or an oligomerase activity afterexposure to a temperature in the range from greater than 37° C. to about95° C. or anywhere in the range from greater than 55° C. to about 85° C.The polypeptide can retain a cellulase or an oligomerase activity afterexposure to a temperature in the range between about 1° C. to about 5°C., between about 5° C. to about 15° C., between about 15° C. to about25° C., between about 25° C. to about 37° C., between about 37° C. toabout 95° C., 96° C., 97° C., 98° C. or 99° C., between about 55° C. toabout 85° C., between about 70° C. to about 75° C., or between about 90°C. to about 95° C., or more. In one aspect, the polypeptide retains acellulase or an oligomerase activity after exposure to a temperature inthe range from greater than 90° C. to about 99° C., or 95° C., 96° C.,97° C., 98° C. or 99° C., at about pH 4.5, or more.

The invention provides isolated, synthetic or recombinant nucleic acidscomprising a sequence that hybridizes under stringent conditions to anucleic acid of the invention, including an exemplary sequence of theinvention, e.g., the sequence of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5,SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ IDNO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ IDNO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ IDNO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ IDNO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ IDNO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ IDNO:77, SEQ ID NO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ IDNO:87, SEQ ID NO:89, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:95, SEQ IDNO:97, SEQ ID NO:99, SEQ ID NO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ IDNO:107, SEQ ID NO:109, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:115, SEQID NO:117, SEQ ID NO:119, SEQ ID NO:121, SEQ ID NO:123, SEQ ID NO:125,SEQ ID NO:127, SEQ ID NO:129, SEQ ID NO:131, SEQ ID NO:133, SEQ IDNO:135, SEQ ID NO:137, SEQ ID NO:139, SEQ ID NO:141, SEQ ID NO:143, SEQID NO:145, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153,SEQ ID NO:155, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:161, SEQ IDNO:163, SEQ ID NO:165, SEQ ID NO:167, SEQ ID NO:169, SEQ ID NO:171, SEQID NO:173, SEQ ID NO:175, SEQ ID NO:177, SEQ ID NO:179, SEQ ID NO:181,SEQ ID NO:183, SEQ ID NO:185, SEQ ID NO:187, SEQ ID NO:189, SEQ IDNO:191, SEQ ID NO:193, SEQ ID NO:195, SEQ ID NO:197, SEQ ID NO:199, SEQID NO:201, SEQ ID NO:203, SEQ ID NO:205, SEQ ID NO:207, SEQ ID NO:209,SEQ ID NO:211, SEQ ID NO:213, SEQ ID NO:215, SEQ ID NO:217, SEQ IDNO:219, SEQ ID NO:221, SEQ ID NO:223, SEQ ID NO:225, SEQ ID NO:227, SEQID NO:229, SEQ ID NO:231, SEQ ID NO:233, SEQ ID NO:235, SEQ ID NO:237,SEQ ID NO:239, SEQ ID NO:241, SEQ ID NO:243, SEQ ID NO:245, SEQ IDNO:247, SEQ ID NO:249, SEQ ID NO:251, SEQ ID NO:253, SEQ ID NO:255, SEQID NO:257, SEQ ID NO:259, SEQ ID NO:261, SEQ ID NO:263, SEQ ID NO:265,SEQ ID NO:267, SEQ ID NO:269, SEQ ID NO:271, SEQ ID NO:273, SEQ IDNO:275, SEQ ID NO:277, SEQ ID NO:279, SEQ ID NO:281, SEQ ID NO:283, SEQID NO:285, SEQ ID NO:287, SEQ ID NO:289, SEQ ID NO:291, SEQ ID NO:293,SEQ ID NO:295, SEQ ID NO:297, SEQ ID NO:299, SEQ ID NO:301, SEQ IDNO:303, SEQ ID NO:305, SEQ ID NO:307, SEQ ID NO:309, SEQ ID NO:311, SEQID NO:313, SEQ ID NO:315, SEQ ID NO:317, SEQ ID NO:319, SEQ ID NO:321,SEQ ID NO:323, SEQ ID NO:325, SEQ ID NO:327, SEQ ID NO:329, SEQ IDNO:331, SEQ ID NO:333, SEQ ID NO:335, SEQ ID NO:337, SEQ ID NO:339, SEQID NO:341, SEQ ID NO:343, SEQ ID NO:345, SEQ ID NO:347, SEQ ID NO:349,SEQ ID NO:351, SEQ ID NO:353, SEQ ID NO:355, SEQ ID NO:357, SEQ IDNO:359, SEQ ID NO:361, SEQ ID NO:363, SEQ ID NO:365, SEQ ID NO:367, SEQID NO:369, SEQ ID NO:371, SEQ ID NO:373, SEQ ID NO:375, SEQ ID NO:377,SEQ ID NO:379, SEQ ID NO:381, SEQ ID NO:383, SEQ ID NO:385, SEQ IDNO:387, SEQ ID NO:389, SEQ ID NO:391, SEQ ID NO:393, SEQ ID NO:395, SEQID NO:397, SEQ ID NO:399, SEQ ID NO:401, SEQ ID NO:403, SEQ ID NO:405,SEQ ID NO:407, SEQ ID NO:409, SEQ ID NO:411, SEQ ID NO:413, SEQ IDNO:415, SEQ ID NO:417, SEQ ID NO:419, SEQ ID NO:421, SEQ ID NO:423, SEQID NO:425, SEQ ID NO:427, SEQ ID NO:429, SEQ ID NO:431, SEQ ID NO:433,SEQ ID NO:435, SEQ ID NO:437, SEQ ID NO:439, SEQ ID NO:441, SEQ IDNO:443, SEQ ID NO:445, SEQ ID NO:447, SEQ ID NO:449, SEQ ID NO:451, SEQID NO:453, SEQ ID NO:455, SEQ ID NO:457, SEQ ID NO:459, SEQ ID NO:461,SEQ ID NO:463, SEQ ID NO:465, SEQ ID NO:467, SEQ ID NO:469, SEQ IDNO:471, SEQ ID NO:473, SEQ ID NO:475, SEQ ID NO:477, SEQ ID NO:479, SEQID NO:481, SEQ ID NO:483, SEQ ID NO:485, SEQ ID NO:487, SEQ ID NO:489,SEQ ID NO:491, SEQ ID NO:493, SEQ ID NO:495, SEQ ID NO:497, SEQ IDNO:499, SEQ ID NO:501, SEQ ID NO:503, SEQ ID NO:505, SEQ ID NO:507, SEQID NO:509, SEQ ID NO:511, SEQ ID NO:513, SEQ ID NO:515, SEQ ID NO:517,SEQ ID NO:519, SEQ ID NO:521 and/or SEQ ID NO:523 (see also Tables 1, 2,and 3, Examples 1 and 4, below,), or fragments or subsequences thereof.In one aspect, the nucleic acid encodes a polypeptide having acellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase activity, or can hydrolyze(degrade) soluble cellooligsaccharides and arabinoxylan oligomers intomonomer xylose, arabinose and glucose. The nucleic acid can be at leastabout 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300,350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000,1050, 1100, 1150, 1200 or more residues in length or the full length ofthe gene or transcript. In one aspect, the stringent conditions comprisea wash step comprising a wash in 0.2×SSC at a temperature of about 65°C. for about 15 minutes.

The invention provides a nucleic acid probe for identifying or isolatinga nucleic acid encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/oroligomerase activity, or can hydrolyze (degrade) solublecellooligsaccharides and arabinoxylan oligomers into monomer xylose,arabinose and glucose, wherein the probe comprises at least about 10,15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100,150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,850, 900, 950, 1000 or more, consecutive bases of a sequence comprisinga sequence of the invention, or fragments or subsequences thereof,wherein the probe identifies the nucleic acid by binding orhybridization. The probe can comprise an oligonucleotide comprising atleast about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, orabout 60 to 100 consecutive bases of a sequence comprising a sequence ofthe invention, or fragments or subsequences thereof.

The invention provides a nucleic acid probe for identifying or isolatinga nucleic acid encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/oroligomerase activity, or can hydrolyze (degrade) solublecellooligsaccharides and arabinoxylan oligomers into monomer xylose,arabinose and glucose, wherein the probe comprises a nucleic acidcomprising a sequence at least about 10, 15, 20, 30, 40, 50, 60, 70, 80,90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700,750, 800, 850, 900, 950, 1000 or more residues of a nucleic acid of theinvention, e.g., a polynucleotide having at least about 50%, 51%, 52%,53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%,67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%,81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identityto an exemplary nucleic acid of the invention. In one aspect, thesequence identities are determined by analysis with a sequencecomparison algorithm or by visual inspection. In alternative aspects,the probe can comprise an oligonucleotide comprising at least about 10to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to100 consecutive bases of a nucleic acid sequence of the invention, or asubsequence thereof.

The invention provides an amplification primer pair for amplifying(e.g., by PCR) a nucleic acid encoding a polypeptide having a cellulose,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase activity, or can hydrolyze(degrade) soluble cellooligsaccharides and arabinoxylan oligomers intomonomer xylose, arabinose and glucose, wherein the primer pair iscapable of amplifying a nucleic acid comprising a sequence of theinvention, or fragments or subsequences thereof. One or each member ofthe amplification primer sequence pair can comprise an oligonucleotidecomprising at least about 10 to 50, or more, consecutive bases of thesequence, or about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 or moreconsecutive bases of the sequence. The invention provides amplificationprimer pairs, wherein the primer pair comprises a first member having asequence as set forth by about the first (the 5′) 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,35, 36 or more residues of a nucleic acid of the invention, and a secondmember having a sequence as set forth by about the first (the 5′) 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,31, 32, 33, 34, 35, 36 or more residues of the complementary strand ofthe first member.

The invention provides cellulase-encoding, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase,xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/oroligomerase-encoding nucleic acids generated by amplification, e.g.,polymerase chain reaction (PCR), using an amplification primer pair ofthe invention. The invention provides cellulase-encoding, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/oroligomerase-encoding nucleic acids generated by amplification, e.g.,polymerase chain reaction (PCR), using an amplification primer pair ofthe invention. The invention provides methods of making a cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase, by amplification, e.g.,polymerase chain reaction (PCR), using an amplification primer pair ofthe invention. In one aspect, the amplification primer pair amplifies anucleic acid from a library, e.g., a gene library, such as anenvironmental library.

The invention provides methods of amplifying a nucleic acid encoding apolypeptide having a cellulase activity, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase,xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/oroligomerase, or can hydrolyze (degrade) soluble cellooligsaccharides andarabinoxylan oligomers into monomer xylose, arabinose and glucose,comprising amplification of a template nucleic acid with anamplification primer sequence pair capable of amplifying a nucleic acidsequence of the invention, or fragments or subsequences thereof.

The invention provides expression cassettes comprising a nucleic acid ofthe invention or a subsequence thereof. In one aspect, the expressioncassette can comprise the nucleic acid that is operably linked to apromoter. The promoter can be a viral, bacterial, mammalian or plantpromoter. In one aspect, the plant promoter can be a potato, rice, corn,wheat, tobacco or barley promoter. The promoter can be a constitutivepromoter. The constitutive promoter can comprise CaMV35S. In anotheraspect, the promoter can be an inducible promoter. In one aspect, thepromoter can be a tissue-specific promoter or an environmentallyregulated or a developmentally regulated promoter. Thus, the promotercan be, e.g., a seed-specific, a leaf-specific, a root-specific, astem-specific or an abscission-induced promoter. In one aspect, theexpression cassette can further comprise a plant or plant virusexpression vector.

The invention provides cloning vehicles comprising an expressioncassette (e.g., a vector) of the invention or a nucleic acid of theinvention. The cloning vehicle can be a viral vector, a plasmid, aphage, a phagemid, a cosmid; a fosmid, a bacteriophage or an artificialchromosome. The viral vector can comprise an adenovirus vector, aretroviral vector or an adeno-associated viral vector. The cloningvehicle can comprise a bacterial artificial chromosome (BAC), a plasmid,a bacteriophage P1-derived vector (PAC), a yeast artificial chromosome(YAC), or a mammalian artificial chromosome (MAC).

The invention provides transformed cell comprising a nucleic acid of theinvention or an expression cassette (e.g., a vector) of the invention,or a cloning vehicle of the invention. In one aspect, the transformedcell can be a bacterial cell, a mammalian cell, a fungal cell, a yeastcell, an insect cell or a plant cell. In one aspect, the plant cell canbe soybeans, rapeseed, oilseed, tomato, cane sugar, a cereal, a potato,wheat, rice, corn, tobacco or barley cell.

The invention provides transgenic non-human animals comprising a nucleicacid of the invention or an expression cassette (e.g., a vector) of theinvention. In one aspect, the animal is a mouse, a rat, a pig, a goat ora sheep.

The invention provides transgenic plants comprising a nucleic acid ofthe invention or an expression cassette (e.g., a vector) of theinvention. The transgenic plant can be a cereal plant, a corn plant, apotato plant, a tomato plant, a wheat plant, an oilseed plant, arapeseed plant, a soybean plant, a rice plant, a barley plant or atobacco plant.

The invention provides transgenic seeds comprising a nucleic acid of theinvention or an expression cassette (e.g., a vector) of the invention.The transgenic seed can be a cereal plant, a corn seed, a wheat kernel,an oilseed, a rapeseed, a soybean seed, a palm kernel, a sunflower seed,a sesame seed, a peanut or a tobacco plant seed.

The invention provides an antisense oligonucleotide comprising a nucleicacid sequence complementary to or capable of hybridizing under stringentconditions to a nucleic acid of the invention. The invention providesmethods of inhibiting the translation of a cellulase, e.g.,endoglucanase, cellobiohydrolase, mannanase, β-glucosidase(beta-glucosidase), xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase enzyme message in a cellcomprising administering to the cell or expressing in the cell anantisense oligonucleotide comprising a nucleic acid sequencecomplementary to or capable of hybridizing under stringent conditions toa nucleic acid of the invention. In one aspect, the antisenseoligonucleotide is between about 10 to 50, about 20 to 60, about 30 to70, about 40 to 80, or about 60 to 100 bases in length, e.g., 10, 15,20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 ormore bases in length. The invention provides methods of inhibiting thetranslation of a cellulase enzyme, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase,xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/or oligomeraseenzyme message in a cell comprising administering to the cell orexpressing in the cell an antisense oligonucleotide comprising a nucleicacid sequence complementary to or capable of hybridizing under stringentconditions to a nucleic acid of the invention.

The invention provides double-stranded inhibitory RNA (RNAi, or RNAinterference) molecules (including small interfering RNA, or siRNAs, forinhibiting transcription, and microRNAs, or miRNAs, for inhibitingtranslation) comprising a subsequence of a sequence of the invention. Inone aspect, the siRNA is between about 21 to 24 residues, or, about atleast 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,31, 32, 33, 34, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100or more duplex nucleotides in length. The invention provides methods ofinhibiting the expression of a cellulase enzyme, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase,xylosidase (e.g., β-xylosidase), arabinofuranosidase, and/or oligomeraseactivity, e.g., can hydrolyze (degrade) soluble cellooligsaccharides andarabinoxylan oligomers into monomer xylose, arabinose and glucose, in acell comprising administering to the cell or expressing in the cell adouble-stranded inhibitory RNA (siRNA or miRNA), wherein the RNAcomprises a subsequence of a sequence of the invention.

The invention provides isolated, synthetic or recombinant polypeptidescomprising an amino acid sequence having at least about 50%, 51%, 52%,53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%,67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%,81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identityto an exemplary polypeptide or peptide of the invention over a region ofat least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350 ormore residues, or over the full length of the polypeptide. In oneaspect, the sequence identities are determined by analysis with asequence comparison algorithm or by a visual inspection. Exemplarypolypeptide or peptide sequences of the invention include SEQ ID NO:2,SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQID NO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ IDNO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ IDNO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ IDNO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ IDNO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ IDNO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ IDNO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ IDNO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ IDNO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID NO:100, SEQ ID NO:102, SEQ IDNO:104, SEQ ID NO:106, SEQ ID NO:108, SEQ ID NO:110, SEQ ID NO:112, SEQID NO:114, SEQ ID NO:116, SEQ ID NO:118, SEQ ID NO:120, SEQ ID NO:122,SEQ ID NO:124, SEQ ID NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ IDNO:132, SEQ ID NO:134, SEQ ID NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQID NO:142, SEQ ID NO:143, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150,SEQ ID NO:152, SEQ ID NO:154, SEQ ID NO:156, SEQ ID NO:158, SEQ IDNO:160, SEQ ID NO:162, SEQ ID NO:164, SEQ ID NO:166, SEQ ID NO:168, SEQID NO:170, SEQ ID NO:172, SEQ ID NO:174, SEQ ID NO:176, SEQ ID NO:178,SEQ ID NO:180, SEQ ID NO:182, SEQ ID NO:184, SEQ ID NO:186, SEQ IDNO:188, SEQ ID NO:190, SEQ ID NO:192, SEQ ID NO:194, SEQ ID NO:196, SEQID NO:198, SEQ ID NO:200, SEQ ID NO:202, SEQ ID NO:204, SEQ ID NO:206,SEQ ID NO:209, SEQ ID NO:210, SEQ ID NO:212, SEQ ID NO:214, SEQ IDNO:216, SEQ ID NO:218, SEQ ID NO:220, SEQ ID NO:222, SEQ ID NO:224, SEQID NO:226, SEQ ID NO:228, SEQ ID NO:230, SEQ ID NO:232, SEQ ID NO:234,SEQ ID NO:236, SEQ ID NO:238, SEQ ID NO:240, SEQ ID NO:242, SEQ IDNO:244, SEQ ID NO:246, SEQ ID NO:248, SEQ ID NO:250, SEQ ID NO:252, SEQID NO:254, SEQ ID NO:256, SEQ ID NO:258, SEQ ID NO:260, SEQ ID NO:262,SEQ ID NO:264, SEQ ID NO:266, SEQ ID NO:268, SEQ ID NO:270, SEQ IDNO:272, SEQ ID NO:274, SEQ ID NO:276, SEQ ID NO:278, SEQ ID NO:280, SEQID NO:282, SEQ ID NO:284, SEQ ID NO:286, SEQ ID NO:288, SEQ ID NO:290,SEQ ID NO:292, SEQ ID NO:294, SEQ ID NO:296, SEQ ID NO:298, SEQ IDNO:300, SEQ ID NO:302, SEQ ID NO:304, SEQ ID NO:306, SEQ ID NO:308, SEQID NO:310, SEQ ID NO:312, SEQ ID NO:314, SEQ ID NO:316, SEQ ID NO:318,SEQ ID NO:320, SEQ ID NO:322, SEQ ID NO:324, SEQ ID NO:326, SEQ IDNO:328, SEQ ID NO:330, SEQ ID NO:332, SEQ ID NO:334, SEQ ID NO:336, SEQID NO:338, SEQ ID NO:340, SEQ ID NO:342, SEQ ID NO:344, SEQ ID NO:346,SEQ ID NO:348, SEQ ID NO:350, SEQ ID NO:352, SEQ ID NO:354, SEQ IDNO:356, SEQ ID NO:358, SEQ ID NO:360, SEQ ID NO:362, SEQ ID NO:364, SEQID NO:366, SEQ ID NO:368, SEQ ID NO:370, SEQ ID NO:372, SEQ ID NO:374,SEQ ID NO:376, SEQ ID NO:378, SEQ ID NO:380, SEQ ID NO:382, SEQ IDNO:384, SEQ ID NO:386, SEQ ID NO:388, SEQ ID NO:390, SEQ ID NO:392, SEQID NO:394, SEQ ID NO:396, SEQ ID NO:398, SEQ ID NO:400, SEQ ID NO:402,SEQ ID NO:404, SEQ ID NO:406, SEQ ID NO:408, SEQ ID NO:410, SEQ IDNO:412, SEQ ID NO:414, SEQ ID NO:416, SEQ ID NO:418, SEQ ID NO:420, SEQID NO:422, SEQ ID NO:424, SEQ ID NO:426, SEQ ID NO:428, SEQ ID NO:430,SEQ ID NO:432, SEQ ID NO:434, SEQ ID NO:436, SEQ ID NO:438, SEQ IDNO:440, SEQ ID NO:442, SEQ ID NO:444, SEQ ID NO:446, SEQ ID NO:448, SEQID NO:450, SEQ ID NO:452, SEQ ID NO:454, SEQ ID NO:456, SEQ ID NO:458,SEQ ID NO:460, SEQ ID NO:462, SEQ ID NO:464, SEQ ID NO:466, SEQ IDNO:468, SEQ ID NO:470, SEQ ID NO:472, SEQ ID NO:474, SEQ ID NO:476, SEQID NO:478, SEQ ID NO:480, SEQ ID NO:482, SEQ ID NO:484, SEQ ID NO:486,SEQ ID NO:488, SEQ ID NO:490, SEQ ID NO:492, SEQ ID NO:494, SEQ IDNO:496, SEQ ID NO:498, SEQ ID NO:500, SEQ ID NO:502, SEQ ID NO:504, SEQID NO:506, SEQ ID NO:508, SEQ ID NO:510, SEQ ID NO:512, SEQ ID NO:514,SEQ ID NO:516, SEQ ID NO:518, SEQ ID NO:520, SEQ ID NO:522 and/or SEQ IDNO:524 (see also Tables 1, 2, and 3, Examples 1 and 4, below, andSequence Listing); and subsequences thereof and variants thereof.Exemplary polypeptides also include fragments of at least about 10, 15,20, 25, 30, 35, 40, 45, 50, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300,350, 400, 450, 500, 550, 600 or more residues in length, or over thefull length of an enzyme. Polypeptide or peptide sequences of theinvention include sequence encoded by a nucleic acid of the invention.Polypeptide or peptide sequences of the invention include polypeptidesor peptides specifically bound by an antibody of the invention (e.g.,epitopes), or polypeptides or peptides that can generate an antibody ofthe invention (e.g., an immunogen).

In one aspect, a polypeptide of the invention has at least one cellulaseenzyme, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase enzyme activity. In alternativeaspects, a polynucleotide of the invention encodes a polypeptide thathas at least one cellulase enzyme, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme.

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase,mannanase, β-glucosidase (beta-glucosidase), xylanase, xylosidase (e.g.,β-xylosidase), arabinofuranosidase, and/or oligomerase activity isthermostable. The polypeptide can retain a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanase, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity under conditions comprising a temperaturerange of between about 1° C. to about 5° C., between about 5° C. toabout 15° C., between about 15° C. to about 25° C., between about 25° C.to about 37° C., between about 37° C. to about 95° C., between about 55°C. to about 85° C., between about 70° C. to about 75° C., or betweenabout 90° C. to about 95° C., or more. In another aspect, the cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), mannanase, xylanase, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity can bethermotolerant. The polypeptide can retain a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanase, β-xylosidase, arabinofuranosidase, and/oroligomerase activity after exposure to a temperature in the range fromgreater than 37° C. to about 95° C., or in the range from greater than55° C. to about 85° C. In one aspect, the polypeptide can retain acellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanase, β-xylosidase,arabinofuranosidase, and/or oligomerase activity, after exposure to atemperature in the range from greater than 90° C. to about 95° C. at pH4.5.

Another aspect of the invention provides an isolated, synthetic orrecombinant polypeptide or peptide comprising at least 10, 15, 20, 25,30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150 ormore consecutive bases of a polypeptide or peptide sequence of theinvention, sequences substantially identical thereto, and the sequencescomplementary thereto. The peptide can be, e.g., an immunogenicfragment, a motif (e.g., a binding site), a signal sequence, a preprosequence or an active site.

The invention provides isolated, synthetic or recombinant nucleic acidscomprising a sequence encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),mannanase, xylanase, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase enzyme activity and a signalsequence, wherein the nucleic acid comprises a sequence of theinvention. The signal sequence can be derived from another cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), mannanase, xylanase, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme or a non-cellulase, e.g.,non-endoglucanase, non-cellobiohydrolase, non-β-glucosidase(non-beta-glucosidase), non-xylanase, non-mannanase, non-β-xylosidase,non-arabinofuranosidase, and/or non-oligomerase (a heterologous) enzyme.The invention provides isolated, synthetic or recombinant nucleic acidscomprising a sequence encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, β-xylosidase, mannanase, arabinofuranosidase, and/oroligomerase enzyme activity, wherein the sequence does not contain asignal sequence and the nucleic acid comprises a sequence of theinvention. In one aspect, the invention provides an isolated, syntheticor recombinant polypeptide comprising a polypeptide of the inventionlacking all or part of a signal sequence. In one aspect, the isolated,synthetic or recombinant polypeptide can comprise the polypeptide of theinvention comprising a heterologous signal sequence, such as aheterologous cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme signal sequence ornon-cellulase, e.g., non-endoglucanase, non-cellobiohydrolase,non-β-glucosidase (non-beta-glucosidase), non-xylanase, non-mannanse,non-β-xylosidase, non-arabinofuranosidase, and/or non-oligomerase enzymesignal sequence.

In one aspect, the invention provides chimeric proteins comprising afirst domain comprising a signal sequence of the invention and at leasta second domain. The protein can be a fusion protein. The second domaincan comprise an enzyme. The enzyme can be a non-enzyme.

The invention provides chimeric polypeptides comprising at least a firstdomain comprising signal peptide (SP), a prepro sequence and/or acatalytic domain (CD) of the invention and at least a second domaincomprising a heterologous polypeptide or peptide, wherein theheterologous polypeptide or peptide is not naturally associated with thesignal peptide (SP), prepro sequence and/or catalytic domain (CD). Inone aspect, the heterologous polypeptide or peptide is not a cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, xylosidase (e.g., β-xylosidase),arabinofuranosidase, and/or oligomerase enzyme. The heterologouspolypeptide or peptide can be amino terminal to, carboxy terminal to oron both ends of the signal peptide (SP), prepro sequence and/orcatalytic domain (CD).

The invention provides isolated, synthetic or recombinant nucleic acidsencoding a chimeric polypeptide, wherein the chimeric polypeptidecomprises at least a first domain comprising signal peptide (SP), aprepro domain and/or a catalytic domain (CD) of the invention and atleast a second domain comprising a heterologous polypeptide or peptide,wherein the heterologous polypeptide or peptide is not naturallyassociated with the signal peptide (SP), prepro domain and/or catalyticdomain (CD).

The invention provides isolated, synthetic or recombinant signalsequences (e.g., signal peptides) consisting of or comprising thesequence of (a sequence as set forth in) residues 1 to 14, 1 to 15, 1 to16, 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to24, 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 to 38, 1 to 40, 1 to41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46 or 1 to 47, of apolypeptide of the invention, e.g., the exemplary SEQ ID NO:2, SEQ IDNO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ IDNO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ IDNO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ IDNO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ IDNO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ IDNO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ IDNO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ IDNO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ IDNO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ IDNO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID NO:100, SEQ ID NO:102, SEQ IDNO:104, SEQ ID NO:106, SEQ ID NO:108, SEQ ID NO:110, SEQ ID NO:112, SEQID NO:114, SEQ ID NO:116, SEQ ID NO:118, SEQ ID NO:120, SEQ ID NO:122,SEQ ID NO:124, SEQ ID NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ IDNO:132, SEQ ID NO:134, SEQ ID NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQID NO:142, SEQ ID NO:143, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150,SEQ ID NO:152, SEQ ID NO:154, SEQ ID NO:156, SEQ ID NO:158, SEQ IDNO:160, SEQ ID NO:162, SEQ ID NO:164, SEQ ID NO:166, SEQ ID NO:168, SEQID NO:170, SEQ ID NO:172, SEQ ID NO:174, SEQ ID NO:176, SEQ ID NO:178,SEQ ID NO:180, SEQ ID NO:182, SEQ ID NO:184, SEQ ID NO:186, SEQ IDNO:188, SEQ ID NO:190, SEQ ID NO:192, SEQ ID NO:194, SEQ ID NO:196, SEQID NO:198, SEQ ID NO:200, SEQ ID NO:202, SEQ ID NO:204, SEQ ID NO:206,SEQ ID NO:209, SEQ ID NO:210, SEQ ID NO:212, SEQ ID NO:214, SEQ IDNO:216, SEQ ID NO:218, SEQ ID NO:220, SEQ ID NO:222, SEQ ID NO:224, SEQID NO:226, SEQ ID NO:228, SEQ ID NO:230, SEQ ID NO:232, SEQ ID NO:234,SEQ ID NO:236, SEQ ID NO:238, SEQ ID NO:240, SEQ ID NO:242, SEQ IDNO:244, SEQ ID NO:246, SEQ ID NO:248, SEQ ID NO:250, SEQ ID NO:252, SEQID NO:254, SEQ ID NO:256, SEQ ID NO:258, SEQ ID NO:260, SEQ ID NO:262,SEQ ID NO:264, SEQ ID NO:266, SEQ ID NO:268, SEQ ID NO:270, SEQ IDNO:272, SEQ ID NO:274, SEQ ID NO:276, SEQ ID NO:278, SEQ ID NO:280, SEQID NO:282, SEQ ID NO:284, SEQ ID NO:286, SEQ ID NO:288, SEQ ID NO:290,SEQ ID NO:292, SEQ ID NO:294, SEQ ID NO:296, SEQ ID NO:298, SEQ IDNO:300, SEQ ID NO:302, SEQ ID NO:304, SEQ ID NO:306, SEQ ID NO:308, SEQID NO:310, SEQ ID NO:312, SEQ ID NO:314, SEQ ID NO:316, SEQ ID NO:318,SEQ ID NO:320, SEQ ID NO:322, SEQ ID NO:324, SEQ ID NO:326, SEQ IDNO:328, SEQ ID NO:330, SEQ ID NO:332, SEQ ID NO:334, SEQ ID NO:336, SEQID NO:338, SEQ ID NO:340, SEQ ID NO:342, SEQ ID NO:344, SEQ ID NO:346,SEQ ID NO:348, SEQ ID NO:350, SEQ ID NO:352, SEQ ID NO:354, SEQ IDNO:356, SEQ ID NO:358, SEQ ID NO:360, SEQ ID NO:362, SEQ ID NO:364, SEQID NO:366, SEQ ID NO:368, SEQ ID NO:370, SEQ ID NO:372, SEQ ID NO:374,SEQ ID NO:376, SEQ ID NO:378, SEQ ID NO:380, SEQ ID NO:382, SEQ IDNO:384, SEQ ID NO:386, SEQ ID NO:388, SEQ ID NO:390, SEQ ID NO:392, SEQID NO:394, SEQ ID NO:396, SEQ ID NO:398, SEQ ID NO:400, SEQ ID NO:402,SEQ ID NO:404, SEQ ID NO:406, SEQ ID NO:408, SEQ ID NO:410, SEQ IDNO:412, SEQ ID NO:414, SEQ ID NO:416, SEQ ID NO:418, SEQ ID NO:420, SEQID NO:422, SEQ ID NO:424, SEQ ID NO:426, SEQ ID NO:428, SEQ ID NO:430,SEQ ID NO:432, SEQ ID NO:434, SEQ ID NO:436, SEQ ID NO:438, SEQ IDNO:440, SEQ ID NO:442, SEQ ID NO:444, SEQ ID NO:446, SEQ ID NO:448, SEQID NO:450, SEQ ID NO:452, SEQ ID NO:454, SEQ ID NO:456, SEQ ID NO:458,SEQ ID NO:460, SEQ ID NO:462, SEQ ID NO:464, SEQ ID NO:466, SEQ IDNO:468, SEQ ID NO:470, SEQ ID NO:472, SEQ ID NO:474, SEQ ID NO:476, SEQID NO:478, SEQ ID NO:480, SEQ ID NO:482, SEQ ID NO:484, SEQ ID NO:486,SEQ ID NO:488, SEQ ID NO:490, SEQ ID NO:492, SEQ ID NO:494, SEQ IDNO:496, SEQ ID NO:498, SEQ ID NO:500, SEQ ID NO:502, SEQ ID NO:504, SEQID NO:506, SEQ ID NO:508, SEQ ID NO:510, SEQ ID NO:512, SEQ ID NO:514,SEQ ID NO:516, SEQ ID NO:518, SEQ ID NO:520, SEQ ID NO:522 and/or SEQ IDNO:524 (see Tables 1, 2, and 3, Examples 1 and 4, below, and SequenceListing). In one aspect, the invention provides signal sequencescomprising the first 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38; 39, 40, 41, 42, 43, 44,45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,63, 64, 65, 66, 67, 68, 69, 70 or more amino terminal residues of apolypeptide of the invention.

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity comprises aspecific activity at about 37° C. in the range from about 1 to about1200 units per milligram of protein, or, about 100 to about 1000 unitsper milligram of protein. In another aspect, the cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity comprises a specific activity from about 100to about 1000 units per milligram of protein, or, from about 500 toabout 750 units per milligram of protein. Alternatively, the cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity comprises aspecific activity at 37° C. in the range from about 1 to about 750 unitsper milligram of protein, or, from about 500 to about 1200 units permilligram of protein. In one aspect, the cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activitycomprises a specific activity at 37° C. in the range from about 1 toabout 500 units per milligram of protein, or, from about 750 to about1000 units per milligram of protein. In another aspect, the cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity comprises aspecific activity at 37° C. in the range from about 1 to about 250 unitsper milligram of protein. Alternatively, the cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity comprises a specific activity at 37° C. inthe range from about 1 to about 100 units per milligram of protein.

In another aspect, the thermotolerance comprises retention of at leasthalf of the specific activity of the cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme at 37° C.after being heated to the elevated temperature. Alternatively, thethermotolerance can comprise retention of specific activity at 37° C. inthe range from about 1 to about 1200 units per milligram of protein, or,from about 500 to about 1000 units per milligram of protein, after beingheated to the elevated temperature. In another aspect, thethermotolerance can comprise retention of specific activity at 37° C. inthe range from about 1 to about 500 units per milligram of protein afterbeing heated to the elevated temperature.

The invention provides the isolated, synthetic or recombinantpolypeptide of the invention, wherein the polypeptide comprises at leastone glycosylation site. In one aspect, glycosylation can be an N-linkedglycosylation. In one aspect, the polypeptide can be glycosylated afterbeing expressed in a P. pastoris or a S. pombe.

In one aspect, the polypeptide can retain cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity under conditions comprising about pH 6.5, pH6, pH 5.5, pH 5, pH 4.5 or pH 4 or more acidic. In another aspect, thepolypeptide can retain cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activityunder conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH9.5, pH 10, pH 10.5 or pH 11 or more basic pH. In one aspect, thepolypeptide can retain cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activityafter exposure to conditions comprising about pH 6.5, pH 6, pH 5.5, pH5, pH 4.5 or pH 4 or more acidic pH. In another aspect, the polypeptidecan retain cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity after exposureto conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH9.5, pH 10, pH 10.5 or pH 11 or more basic pH.

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention hasactivity at under alkaline conditions, e.g., the alkaline conditions ofthe gut, e.g., the small intestine. In one aspect, the polypeptide canretains activity after exposure to the acidic pH of the stomach.

The invention provides protein preparations comprising a polypeptide(including peptides) of the invention, wherein the protein preparationcomprises a liquid, a solid or a gel. The invention providesheterodimers comprising a polypeptide of the invention and a secondprotein or domain. The second member of the heterodimer can be adifferent cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme, a different enzyme oranother protein. In one aspect, the second domain can be a polypeptideand the heterodimer can be a fusion protein. In one aspect, the seconddomain can be an epitope or a tag. In one aspect, the invention provideshomodimers comprising a polypeptide of the invention.

The invention provides immobilized polypeptides (including peptides)having cellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity, wherein theimmobilized polypeptide comprises a polypeptide of the invention, apolypeptide encoded by a nucleic acid of the invention, or a polypeptidecomprising a polypeptide of the invention and a second domain. In oneaspect, the polypeptide can be immobilized on a cell, a metal, a resin,a polymer, a ceramic, a glass, a microelectrode, a graphitic particle, abead, a gel, a plate, an array or a capillary tube.

The invention also provides arrays comprising an immobilized nucleicacid of the invention, including, e.g., probes of the invention. Theinvention also provides arrays comprising an antibody of the invention.

The invention provides isolated, synthetic or recombinant antibodiesthat specifically bind to a polypeptide of the invention or to apolypeptide encoded by a nucleic acid of the invention. These antibodiesof the invention can be a monoclonal or a polyclonal antibody. Theinvention provides hybridomas comprising an antibody of the invention,e.g., an antibody that specifically binds to a polypeptide of theinvention or to a polypeptide encoded by a nucleic acid of theinvention. The invention provides nucleic acids encoding theseantibodies.

The invention provides method of isolating or identifying a polypeptidehaving cellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity comprising thesteps of: (a) providing an antibody of the invention; (b) providing asample comprising polypeptides; and (c) contacting the sample of step(b) with the antibody of step (a) under conditions wherein the antibodycan specifically bind to the polypeptide, thereby isolating oridentifying a polypeptide having cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activity.

The invention provides methods of making an anti-oligomerase; ananti-cellulase, e.g., anti-endoglucanase, anti-cellobiohydrolase,anti-β-glucosidase (anti-beta-glucosidase), anti-xylanase,anti-mannanse, anti-β-xylosidase, anti-arabinofuranosidase, and/oranti-oligomerase enzyme antibody comprising administering to a non-humananimal a nucleic acid of the invention or a polypeptide of the inventionor subsequences thereof in an amount sufficient to generate a humoralimmune response, thereby making an anti-oligomerase or anti-cellulase,e.g., anti-endoglucanase, anti-cellobiohydrolase, anti-β-glucosidase(anti-beta-glucosidase), anti-xylanase, anti-mannanse,anti-β-xylosidase, anti-arabinofuranosidase, and/or anti-oligomeraseenzyme antibody. The invention provides methods of making ananti-oligomerase or anti-cellulase, e.g., anti-endoglucanase,anti-cellobiohydrolase, anti-β-glucosidase (anti-beta-glucosidase),anti-xylanase, anti-mannanse, anti-β-xylosidase,anti-arabinofuranosidase, and/or anti-oligomerase immune response(cellular or humoral) comprising administering to a non-human animal anucleic acid of the invention or a polypeptide of the invention orsubsequences thereof in an amount sufficient to generate an immuneresponse (cellular or humoral).

The invention provides methods of producing a recombinant polypeptidecomprising the steps of: (a) providing a nucleic acid of the inventionoperably linked to a promoter; and (b) expressing the nucleic acid ofstep (a) under conditions that allow expression of the polypeptide,thereby producing a recombinant polypeptide. In one aspect, the methodcan further comprise transforming a host cell with the nucleic acid ofstep (a) followed by expressing the nucleic acid of step (a), therebyproducing a recombinant polypeptide in a transformed cell.

The invention provides methods for identifying a polypeptide havingcellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity comprising thefollowing steps: (a) providing a polypeptide of the invention; or apolypeptide encoded by a nucleic acid of the invention; (b) providingcellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme substrate; and (c)contacting the polypeptide or a fragment or variant thereof of step (a)with the substrate of step (b) and detecting a decrease in the amount ofsubstrate or an increase in the amount of a reaction product, wherein adecrease in the amount of the substrate or an increase in the amount ofthe reaction product detects a polypeptide having cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity. In one aspect, the substrate is acellulose-comprising or a polysaccharide-comprising (e.g., solublecellooligsaccharide- and/or arabinoxylan oligomer-comprising) compound.

The invention provides methods for identifying a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme substrate comprising the following steps: (a)providing a polypeptide of the invention; or a polypeptide encoded by anucleic acid of the invention; (b) providing a test substrate; and (c)contacting the polypeptide of step (a) with the test substrate of step(b) and detecting a decrease in the amount of substrate or an increasein the amount of reaction product, wherein a decrease in the amount ofthe substrate or an increase in the amount of a reaction productidentifies the test substrate as a cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme substrate.

The invention provides methods of determining whether a test compoundspecifically binds to a polypeptide comprising the following steps: (a)expressing a nucleic acid or a vector comprising the nucleic acid underconditions permissive for translation of the nucleic acid to apolypeptide, wherein the nucleic acid comprises a nucleic acid of theinvention, or, providing a polypeptide of the invention; (b) providing atest compound; (c) contacting the polypeptide with the test compound;and (d) determining whether the test compound of step (b) specificallybinds to the polypeptide.

The invention provides methods for identifying a modulator of acellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity comprising thefollowing steps: (a) providing a polypeptide of the invention or apolypeptide encoded by a nucleic acid of the invention; (b) providing atest compound; (c) contacting the polypeptide of step (a) with the testcompound of step (b) and measuring an activity of the cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme, wherein a change in the cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity measured in the presence of the testcompound compared to the activity in the absence of the test compoundprovides a determination that the test compound modulates the cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity. In one aspect,the cellulase, e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity can be measuredby providing a cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme substrate and detecting adecrease in the amount of the substrate or an increase in the amount ofa reaction product, or, an increase in the amount of the substrate or adecrease in the amount of a reaction product. A decrease in the amountof the substrate or an increase in the amount of the reaction productwith the test compound as compared to the amount of substrate orreaction product without the test compound identifies the test compoundas an activator of cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofurahosidase, and/or oligomerase enzyme activity. An increase inthe amount of the substrate or a decrease in the amount of the reactionproduct with the test compound as compared to the amount of substrate orreaction product without the test compound identifies the test compoundas an inhibitor of cellulase, e.g., endoglucanase, cellobiohydrolase,β-glucosidase (beta-glucosidase), xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity.

The invention provides computer systems comprising a processor and adata storage device wherein said data storage device has stored thereona polypeptide sequence or a nucleic acid sequence of the invention(e.g., a polypeptide or peptide encoded by a nucleic acid of theinvention). In one aspect, the computer system can further comprise asequence comparison algorithm and a data storage device having at leastone reference sequence stored thereon. In another aspect, the sequencecomparison algorithm comprises a computer program that indicatespolymorphisms. In one aspect, the computer system can further comprisean identifier that identifies one or more features in said sequence. Theinvention provides computer readable media having stored thereon apolypeptide sequence or a nucleic acid sequence of the invention. Theinvention provides methods for identifying a feature in a sequencecomprising the steps of: (a) reading the sequence using a computerprogram which identifies one or more features in a sequence, wherein thesequence comprises a polypeptide sequence or a nucleic acid sequence ofthe invention; and (b) identifying one or more features in the sequencewith the computer program. The invention provides methods for comparinga first sequence to a second sequence comprising the steps of: (a)reading the first sequence and the second sequence through use of acomputer program which compares sequences, wherein the first sequencecomprises a polypeptide sequence or a nucleic acid sequence of theinvention; and (b) determining differences between the first sequenceand the second sequence with the computer program. The step ofdetermining differences between the first sequence and the secondsequence can further comprise the step of identifying polymorphisms. Inone aspect, the method can further comprise an identifier thatidentifies one or more features in a sequence. In another aspect, themethod can comprise reading the first sequence using a computer programand identifying one or more features in the sequence.

The invention provides methods for isolating or recovering a nucleicacid encoding a polypeptide having cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activityfrom a sample, e.g., an environmental sample, comprising the steps of:(a) providing an amplification primer sequence pair for amplifying anucleic acid encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity, wherein the primer pair is capable ofamplifying a nucleic acid of the invention; (b) isolating a nucleic acidfrom the sample, e.g., environmental sample, or treating the sample,e.g., environmental sample, such that nucleic acid in the sample isaccessible for hybridization to the amplification primer pair; and, (c)combining the nucleic acid of step (b) with the amplification primerpair of step (a) and amplifying nucleic acid from the sample, e.g.,environmental sample, thereby isolating or recovering a nucleic acidencoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activityfrom a sample, e.g., an environmental sample. One or each member of theamplification primer sequence pair can comprise an oligonucleotidecomprising an amplification primer sequence pair of the invention, e.g.,having at least about 10 to 50 consecutive bases of a sequence of theinvention.

The invention provides methods for isolating or recovering a nucleicacid encoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activityfrom a sample, e.g., an environmental sample, comprising the steps of:(a) providing a polynucleotide probe comprising a nucleic acid of theinvention or a subsequence thereof; (b) isolating a nucleic acid fromthe sample, e.g., environmental sample, or treating the sample, e.g.,environmental sample, such that nucleic acid in the sample is accessiblefor hybridization to a polynucleotide probe of step (a); (c) combiningthe isolated nucleic acid or the treated sample, e.g., environmentalsample, of step (b) with the polynucleotide probe of step (a); and (d)isolating a nucleic acid that specifically hybridizes with thepolynucleotide probe of step (a), thereby isolating or recovering anucleic acid encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, β-glucosidase (beta-glucosidase),xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity from a sample, e.g., an environmentalsample. The sample, e.g., environmental sample, can comprise a watersample, a liquid sample, a soil sample, an air sample or a biologicalsample. In one aspect, the biological sample can be derived from abacterial cell, a protozoan cell, an insect cell, a yeast cell, a plantcell, a fungal cell or a mammalian cell.

The invention provides methods of generating a variant of a nucleic acidencoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activitycomprising the steps of: (a) providing a template nucleic acidcomprising a nucleic acid of the invention; and (b) modifying, deletingor adding one or more nucleotides in the template sequence, or acombination thereof, to generate a variant of the template nucleic acid.In one aspect, the method can further comprise expressing the variantnucleic acid to generate a variant cellulase, e.g., endoglucanase,cellobiohydrolase, β-glucosidase (beta-glucosidase), xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymepolypeptide. The modifications, additions or deletions can be introducedby a method comprising error-prone PCR, shuffling,oligonucleotide-directed mutagenesis, assembly PCR, sexual PCRmutagenesis, in vivo mutagenesis, cassette mutagenesis, recursiveensemble mutagenesis, exponential ensemble mutagenesis, site-specificmutagenesis, gene reassembly, Gene Site Saturation Mutagenesis (GSSM),synthetic ligation reassembly (SLR), Chromosomal Saturation Mutagenesis(CSM) or a combination thereof. In another aspect, the modifications,additions or deletions are introduced by a method comprisingrecombination, recursive sequence recombination, phosphothioate-modifiedDNA mutagenesis, uracil-containing template mutagenesis, gapped duplexmutagenesis, point mismatch repair mutagenesis, repair-deficient hoststrain mutagenesis, chemical mutagenesis, radiogenic mutagenesis,deletion mutagenesis, restriction-selection mutagenesis,restriction-purification mutagenesis, artificial gene synthesis,ensemble mutagenesis, chimeric nucleic acid multimer creation and acombination thereof.

In one aspect, the method can be iteratively repeated until a cellulase,e.g., endoglucanase, cellobiohydrolase, β-glucosidase(beta-glucosidase), xylanase, mannanse, xylosidase, arabinofuranosidase,and/or oligomerase enzyme having an altered or different activity or analtered or different stability from that of a polypeptide encoded by thetemplate nucleic acid is produced. In one aspect, the variant cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymepolypeptide is thermotolerant, and retains some activity after beingexposed to an elevated temperature. In another aspect, the variantcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme polypeptide has increased glycosylation as comparedto the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme encoded by a template nucleic acid.Alternatively, the variant cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase polypeptide has a cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymeactivity under a high temperature, wherein the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme encoded bythe template nucleic acid is not active under the high temperature. Inone aspect, the method can be iteratively repeated until a cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymecoding sequence having an altered codon usage from that of the templatenucleic acid is produced. In another aspect, the method can beiteratively repeated until a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme gene having higher orlower level of message expression or stability from that of the templatenucleic acid is produced.

The invention provides methods for modifying codons in a nucleic acidencoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity to increase itsexpression in a host cell, the method comprising the following steps:(a) providing a nucleic acid of the invention encoding a polypeptidehaving a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme activity; and, (b) identifying a non-preferredor a less preferred codon in the nucleic acid of step (a) and replacingit with a preferred or neutrally used codon encoding the same amino acidas the replaced codon, wherein a preferred codon is a codonover-represented in coding sequences in genes in the host cell and anon-preferred or less preferred codon is a codon under-represented incoding sequences in genes in the host cell, thereby modifying thenucleic acid to increase its expression in a host cell.

The invention provides methods for modifying codons in a nucleic acidencoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity; the methodcomprising the following steps: (a) providing a nucleic acid of theinvention; and, (b) identifying a codon in the nucleic acid of step (a)and replacing it with a different codon encoding the same amino acid asthe replaced codon, thereby modifying codons in a nucleic acid encodinga cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme.

The invention provides methods for modifying codons in a nucleic acidencoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity to increase itsexpression in a host cell, the method comprising the following steps:(a) providing a nucleic acid of the invention encoding a cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymepolypeptide; and, (b) identifying a non-preferred or a less preferredcodon in the nucleic acid of step (a) and replacing it with a preferredor neutrally used codon encoding the same amino acid as the replacedcodon, wherein a preferred codon is a codon over-represented in codingsequences in genes in the host cell and a non-preferred or lesspreferred codon is a codon under-represented in coding sequences ingenes in the host cell, thereby modifying the nucleic acid to increaseits expression in a host cell.

The invention provides methods for modifying a codon in a nucleic acidencoding a polypeptide having a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity to decrease itsexpression in a host cell, the method comprising the following steps:(a) providing a nucleic acid of the invention; and (b) identifying atleast one preferred codon in the nucleic acid of step (a) and replacingit with a non-preferred or less preferred codon encoding the same aminoacid as the replaced codon, wherein a preferred codon is a codonover-represented in coding sequences in genes in a host cell and anon-preferred or less preferred codon is a codon under-represented incoding sequences in genes in the host cell, thereby modifying thenucleic acid to decrease its expression in a host cell. In one aspect,the host cell can be a bacterial cell, a fungal cell, an insect cell, ayeast cell, a plant cell or a mammalian cell.

The invention provides methods for producing a library of nucleic acidsencoding a plurality of modified cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme active sites or substratebinding sites, wherein the modified active sites or substrate bindingsites are derived from a first nucleic acid comprising a sequenceencoding a first active site or a first substrate binding site themethod comprising the following steps: (a) providing a first nucleicacid encoding a first active site or first substrate binding site,wherein the first nucleic acid sequence comprises a sequence thathybridizes under stringent conditions to a nucleic acid of theinvention, and the nucleic acid encodes a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme active siteor a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme substrate binding site; (b) providing a set ofmutagenic oligonucleotides that encode naturally-occurring amino acidvariants at a plurality of targeted codons in the first nucleic acid;and, (c) using the set of mutagenic oligonucleotides to generate a setof active site-encoding or substrate binding site-encoding variantnucleic acids encoding a range of amino acid variations at each aminoacid codon that was mutagenized, thereby producing a library of nucleicacids encoding a plurality of modified cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme active sites or substratebinding sites. In one aspect, the method comprises mutagenizing thefirst nucleic acid of step (a) by a method comprising an optimizeddirected evolution system, Gene Site Saturation Mutagenesis (GSSM),synthetic ligation reassembly (SLR), error-prone PCR, shuffling,oligonucleotide-directed mutagenesis, assembly PCR, sexual PCRmutagenesis, in vivo mutagenesis, cassette mutagenesis, recursiveensemble mutagenesis, exponential ensemble mutagenesis, site-specificmutagenesis, gene reassembly, and a combination thereof. In anotheraspect, the method comprises mutagenizing the first nucleic acid of step(a) or variants by a method comprising recombination, recursive sequencerecombination, phosphothioate-modified DNA mutagenesis,uracil-containing template mutagenesis, gapped duplex mutagenesis, pointmismatch repair mutagenesis, repair-deficient host strain mutagenesis,chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis,restriction-selection mutagenesis, restriction-purification mutagenesis,artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acidmultimer creation and a combination thereof.

The invention provides methods for making a small molecule comprisingthe following steps: (a) providing a plurality of biosynthetic enzymescapable of synthesizing or modifying a small molecule, wherein one ofthe enzymes comprises a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme encoded by a nucleic acidof the invention; (b) providing a substrate for at least one of theenzymes of step (a); and (c) reacting the substrate of step (b) with theenzymes under conditions that facilitate a plurality of biocatalyticreactions to generate a small molecule by a series of biocatalyticreactions. The invention provides methods for modifying a small moleculecomprising the following steps: (a) providing a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme, whereinthe enzyme comprises a polypeptide of the invention, or, a polypeptideencoded by a nucleic acid of the invention, or a subsequence thereof;(b) providing a small molecule; and (c) reacting the enzyme of step (a)with the small molecule of step (b) under conditions that facilitate anenzymatic reaction catalyzed by the cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme, thereby modifying asmall molecule by a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymatic reaction. In one aspect, the method cancomprise a plurality of small molecule substrates for the enzyme of step(a), thereby generating a library of modified small molecules producedby at least one enzymatic reaction catalyzed by the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme. In oneaspect, the method can comprise a plurality of additional enzymes underconditions that facilitate a plurality of biocatalytic reactions by theenzymes to form a library of modified small molecules produced by theplurality of enzymatic reactions. In another aspect, the method canfurther comprise the step of testing the library to determine if aparticular modified small molecule that exhibits a desired activity ispresent within the library. The step of testing the library can furthercomprise the steps of systematically eliminating all but one of thebiocatalytic reactions used to produce a portion of the plurality of themodified small molecules within the library by testing the portion ofthe modified small molecule for the presence or absence of theparticular modified small molecule with a desired activity, andidentifying at least one specific biocatalytic reaction that producesthe particular modified small molecule of desired activity.

The invention provides methods for determining a functional fragment ofan oligomerase and/or a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme comprising the steps of:(a) providing a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme, wherein the enzyme comprises a polypeptide ofthe invention, or a polypeptide encoded by a nucleic acid of theinvention, or a subsequence thereof; and (b) deleting a plurality ofamino acid residues from the sequence of step (a) and testing theremaining subsequence for cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase, enzyme activity, therebydetermining a functional fragment of cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase, enzyme. In one aspect, thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity is measured by providing a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme substrateand detecting a decrease in the amount of the substrate or an increasein the amount of a reaction product.

The invention provides methods for whole cell engineering of new ormodified phenotypes by using real-time metabolic flux analysis, themethod comprising the following steps: (a) making a modified cell bymodifying the genetic composition of a cell, wherein the geneticcomposition is modified by addition to the cell of a nucleic acid of theinvention; (b) culturing the modified cell to generate a plurality ofmodified cells; (c) measuring at least one metabolic parameter of thecell by monitoring the cell culture of step (b) in real time; and, (d)analyzing the data of step (c) to determine if the measured parameterdiffers from a comparable measurement in an unmodified cell undersimilar conditions, thereby identifying an engineered phenotype in thecell using real-time metabolic flux analysis. In one aspect, the geneticcomposition of the cell can be modified by a method comprising deletionof a sequence or modification of a sequence in the cell, or, knockingout the expression of a gene. In one aspect, the method can furthercomprise selecting a cell comprising a newly engineered phenotype. Inanother aspect, the method can comprise culturing the selected cell,thereby generating a new cell strain comprising a newly engineeredphenotype.

The invention provides methods of increasing thermotolerance orthermostability of oligomerase and/or a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme polypeptide, the methodcomprising glycosylating a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme polypeptide, wherein thepolypeptide comprises at least thirty contiguous amino acids of apolypeptide of the invention; or a polypeptide encoded by a nucleic acidsequence of the invention, thereby increasing the thermotolerance orthermostability of the cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase polypeptide. In one aspect, thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme specific activity can be thermostable orthermotolerant at a temperature in the range from greater than about 37°C. to about 95° C.

The invention provides methods for overexpressing a recombinantoligomerase and/or cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase polypeptide in a cell comprising expressing a vectorcomprising a nucleic acid comprising a nucleic acid of the invention ora nucleic acid sequence of the invention, wherein the sequenceidentities are determined by analysis with a sequence comparisonalgorithm or by visual inspection, wherein overexpression is effected byuse of a high activity promoter, a dicistronic vector or by geneamplification of the vector.

The invention provides methods of making a transgenic plant comprisingthe following steps: (a) introducing a heterologous nucleic acidsequence into the cell, wherein the heterologous nucleic sequencecomprises a nucleic acid sequence of the invention, thereby producing atransformed plant cell; and (b) producing a transgenic plant from thetransformed cell. In one aspect, the step (a) can further compriseintroducing the heterologous nucleic acid sequence by electroporation ormicroinjection of plant cell protoplasts. In another aspect, the step(a) can further comprise introducing the heterologous nucleic acidsequence directly to plant tissue by DNA particle bombardment.Alternatively, the step (a) can further comprise introducing theheterologous nucleic acid sequence into the plant cell DNA using anAgrobacterium tumefaciens host. In one aspect, the plant cell can be acane sugar, beet, soybean, tomato, potato, corn, rice, wheat, tobacco orbarley cell.

The invention provides methods of expressing a heterologous nucleic acidsequence in a plant cell comprising the following steps: (a)transforming the plant cell with a heterologous nucleic acid sequenceoperably linked to a promoter, wherein the heterologous nucleic sequencecomprises a nucleic acid of the invention; (b) growing the plant underconditions wherein the heterologous nucleic acids sequence is expressedin the plant cell. The invention provides methods of expressing aheterologous nucleic acid sequence in a plant cell comprising thefollowing steps: (a) transforming the plant cell with a heterologousnucleic acid sequence operably linked to a promoter, wherein theheterologous nucleic sequence comprises a sequence of the invention; (b)growing the plant under conditions wherein the heterologous nucleicacids sequence is expressed in the plant cell.

The invention provides methods for hydrolyzing, breaking up ordisrupting a cellooligsaccharide, an arabinoxylan oligomer, or a glucan-or cellulose-comprising composition comprising the following steps: (a)providing a polypeptide of the invention having an oligomerase, acellulase or a cellulolytic activity; (b) providing a compositioncomprising a cellulose or a glucan; and (c) contacting the polypeptideof step (a) with the composition of step (b) under conditions whereinthe cellulase hydrolyzes, breaks up or disrupts the cellooligsaccharide,arabinoxylan oligomer, or glucan- or cellulose-comprising composition;wherein optionally the composition comprises a plant cell, a bacterialcell, a yeast cell, an insect cell, or an animal cell, and optionallythe polypeptide has oligomerase, endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase activity.

The invention provides feeds or foods comprising a polypeptide of theinvention, or a polypeptide encoded by a nucleic acid of the invention.In one aspect, the invention provides a food, feed, a liquid, e.g., abeverage (such as a fruit juice or a beer), a bread or a dough or abread product, or a beverage precursor (e.g., a wort), comprising apolypeptide of the invention. The invention provides food or nutritionalsupplements for an animal comprising a polypeptide of the invention,e.g., a polypeptide encoded by the nucleic acid of the invention.

In one aspect, the polypeptide in the food or nutritional supplement canbe glycosylated. The invention provides edible enzyme delivery matricescomprising a polypeptide of the invention, e.g., a polypeptide encodedby the nucleic acid of the invention. In one aspect, the delivery matrixcomprises a pellet. In one aspect, the polypeptide can be glycosylated.In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme activity is thermotolerant. In another aspect,the cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity is thermostable.

The invention provides a food, a feed or a nutritional supplementcomprising a polypeptide of the invention. The invention providesmethods for utilizing a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme as a nutritionalsupplement in an animal diet, the method comprising: preparing anutritional supplement containing a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme comprising at leastthirty contiguous amino acids of a polypeptide of the invention; andadministering the nutritional supplement to an animal. The animal can bea human, a ruminant or a monogastric animal. The cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme can beprepared by expression of a polynucleotide encoding the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme in anorganism selected from the group consisting of a bacterium, a yeast, aplant, an insect, a fungus and an animal. The organism can be selectedfrom the group consisting of an S. pombe, S. cerevisiae, Pichiapastoris, E. coli, Streptomyces sp., Bacillus sp. and Lactobacillus sp.

The invention provides edible enzyme delivery matrix comprising athermostable recombinant cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme, e.g., a polypeptide ofthe invention. The invention provides methods for delivering acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme supplement to an animal, the method comprising:preparing an edible enzyme delivery matrix in the form of pelletscomprising a granulate edible carrier and a thermostable recombinantcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme, wherein the pellets readily disperse the cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymecontained therein into aqueous media, and administering the edibleenzyme delivery matrix to the animal. The recombinant cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme cancomprise a polypeptide of the invention. The cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme can beglycosylated to provide thermostability at pelletizing conditions. Thedelivery matrix can be formed by pelletizing a mixture comprising agrain germ and a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme. The pelletizing conditions can includeapplication of steam. The pelletizing conditions can compriseapplication of a temperature in excess of about 80° C. for about 5minutes and the enzyme retains a specific activity of at least 350 toabout 900 units per milligram of enzyme.

In one aspect, invention provides a pharmaceutical compositioncomprising a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme of the invention, or a polypeptide encoded bya nucleic acid of the invention. In one aspect, the pharmaceuticalcomposition acts as a digestive aid.

In certain aspects, a cellulose-containing compound is contacted apolypeptide of the invention having a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity at a pH in therange of between about pH 3.0 to 9.0, 10.0, 11.0 or more. In otheraspects, a cellulose-containing compound is contacted with thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme at a temperature of about 55° C., 60° C., 65° C., 70°C., 75° C., 80° C., 85° C., 90° C., or more.

The invention provides methods for delivering an oligomerase and/or acellulase supplement to an animal, the method comprising: preparing anedible enzyme delivery matrix or pellets comprising a granulate ediblecarrier and a thermostable recombinant cellulase enzyme, wherein thepellets readily disperse the cellulase enzyme contained therein intoaqueous media, and the recombinant cellulase enzyme comprises apolypeptide of the invention, or a polypeptide encoded by a nucleic acidof the invention; and, administering the edible enzyme delivery matrixor pellet to the animal; and optionally the granulate edible carriercomprises a carrier selected from the group consisting of a grain germ,a grain germ that is spent of oil, a hay, an alfalfa, a timothy, a soyhull, a sunflower seed meal and a wheat midd, and optionally the ediblecarrier comprises grain germ that is spent of oil, and optionally thecellulase enzyme is glycosylated to provide thermostability atpelletizing conditions, and optionally the delivery matrix is formed bypelletizing a mixture comprising a grain germ and a cellulase, andoptionally the pelletizing conditions include application of steam, andoptionally the pelletizing conditions comprise application of atemperature in excess of about 80° C. for about 5 minutes and the enzymeretains a specific activity of at least 350 to about 900 units permilligram of enzyme.

The invention provides cellulose- or cellulose derivative-compositionscomprising a polypeptide of the invention, or a polypeptide encoded by anucleic acid of the invention, wherein optionally the polypeptide has anoligomerase, cellulase, endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase activity.

The invention provides wood, wood pulp or wood products comprising acellulase of the invention, or a cellulase encoded by a nucleic acid ofthe invention, wherein optionally the cellulase activity comprisesendoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity.

The invention provides paper, paper pulp or paper products comprising apolypeptide of the invention, or a polypeptide encoded by a nucleic acidof the invention, wherein optionally the polypeptide has cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity.

The invention provides methods for reducing the amount of cellulose in apaper, a wood or wood product comprising contacting the paper, wood orwood product with a cellulase of the invention, or a cellulase encodedby a nucleic acid of the invention, wherein optionally the cellulaseactivity comprises endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase activity.

The invention provides detergent compositions comprising a cellulase ofthe invention, or a cellulase encoded by a nucleic acid of theinvention, wherein optionally the polypeptide is formulated in anon-aqueous liquid composition, a cast solid, a granular form, aparticulate form, a compressed tablet, a gel form, a paste or a slurryform, and optionally the cellulase activity comprises endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity.

The invention provides pharmaceutical compositions or dietarysupplements comprising a cellulase of the invention, or a cellulaseencoded by a nucleic acid of the invention, wherein optionally thecellulase is formulated as a tablet, gel, pill, implant, liquid, spray,powder, food, feed pellet or as an encapsulated formulation andoptionally the cellulase activity comprises endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity.

The invention provides fuels comprising a polypeptide of the invention,or a polypeptide encoded by a nucleic acid of the invention, whereinoptionally the polypeptide has activity comprising cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity, whereinoptionally the fuel is derived from a plant material, which optionallycomprises potatoes, soybean (rapeseed), barley, rye, corn, oats, wheat,beets or sugar cane, and optionally the fuel comprises a bioethanol or agasoline-ethanol mix.

The invention provides methods for making a fuel comprising contacting acomposition comprising a cellulose or a fermentable sugar with apolypeptide of the invention, or a polypeptide encoded by a nucleic acidof the invention, or any one of the mixtures or “cocktails” or productsof manufacture of the invention, wherein optionally the compositioncomprising a cellulose or a fermentable sugar comprises a plant, plantproduct or plant derivative, and optionally the plant or plant productcomprises cane sugar plants or plant products, beets or sugarbeets,wheat, corn, soybeans, potato, rice or barley, and optionally thepolypeptide has activity comprising cellulase, endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity, and optionally thefuel comprises a bioethanol or a gasoline-ethanol mix.

The invention provides methods for making bioethanol comprisingcontacting a composition comprising a cellulose or a fermentable sugarwith a polypeptide of the invention, or a polypeptide encoded by anucleic acid of the invention, or any one of the mixtures or “cocktails”or products of manufacture of the invention, wherein optionally thecomposition comprising a cellulose or a fermentable sugar comprises aplant, plant product or plant derivative, and optionally the plant orplant product comprises cane sugar plants or plant products, beets orsugarbeets, wheat, corn, soybeans, potato, rice or barley, andoptionally the polypeptide has activity comprising cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity.

The invention provides enzyme ensembles, or “cocktail”, fordepolymerization of cellulosic and hemicellulosic polymers tometabolizeable carbon moieties comprising a polypeptide of theinvention, or a polypeptide encoded by a nucleic acid of the invention,wherein optionally the polypeptide has activity comprising cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity. Theenzyme ensembles, or “cocktails”, of the invention can be in the form ofa composition (e.g., a formulation, liquid or solid), e.g., as a productof manufacture.

The invention provides compositions (including products of manufacture,enzyme ensembles, or “cocktails”) comprising (a) a mixture (or“cocktail”) of hemicellulose- and cellulose-hydrolyzing enzymes, whereinthe cellulose-hydrolyzing enzymes comprise at least one of each of aendoglucanase, cellobiohydrolase I (CBH I), cellobiohydrolase II (CBHII) and β-glucosidase; and the hemicellulose-hydrolyzing enzymescomprise at least one of each of an xylanase, β-xylosidase andarabinofuranosidase; (b) a mixture (or “cocktail”) of hemicellulose- andcellulose-hydrolyzing enzymes comprising at least one of each of aendoglucanase, oligomerase, cellobiohydrolase I (CBH I),cellobiohydrolase II (CBH II), arabinofuranosidase and xylanase, whereinoptionally the oligomerase is an oligomerase-1 or β-glucosidase, or anoptionally the oligomerase is an oligomerase-2 or β-xylosidase; (c) amixture (or “cocktail”) of hemicellulose- and cellulose-hydrolyzingenzymes comprising at least one of each of a endoglucanase; acellobiohydrolase I (CBH I); a cellobiohydrolase II (CBH II); anarabinofuranosidase; a xylanase; an oligomerase-1 a β-glucosidase; and,an oligomerase-2 or β-xylosidase; or (d) a mixture (or “cocktail”) ofenzymes comprising (1) an endoglucanase which cleaves internal β-1,4linkages resulting in shorter glucooligosaccharides, (2) acellobiohydrolase which acts in an “exo” manner processively releasingcellobiose units (β-1,4 glucose-glucose disaccharide), and (3) aβ-glucosidase for releasing glucose monomer from shortcellooligosaccharides (e.g., cellobiose).

In alternative aspects of the compositions (e.g., enzyme ensembles, orproducts of manufacture) of the invention (a) the endoglucanase,comprises SEQ ID NO:106, the cellobiohydrolase I comprises SEQ ID NO:34or SEQ ID NO:46, the cellobiohydrolase II comprises SEQ ID NO:98, theβ-glucosidase comprises SEQ ID NO:94, the xylanase comprises SEQ IDNO:100, SEQ ID NO:102 or SEQ ID NO:524, the β-xylosidase comprises SEQID NO:96, the arabinofuranosidase comprises SEQ ID NO:92 or SEQ IDNO:104, or any combination thereof, wherein SEQ ID NO:106 optionallycomprises an additional carbohydrate binding domain; or (b) the mixturecomprises an endoglucanase comprising SEQ ID NO:106, an oligomerase-1comprising SEQ ID NO:522, a cellobiohydrolase I (CBH I) comprising SEQID NO:34 or SEQ ID NO:46, a cellobiohydrolase II (CBH II) comprising SEQID NO:98, an arabinofuranosidase comprising SEQ ID NO:92, anoligomerase-2 (or β-xylosidase) comprising SEQ ID NO:520, and a xylanasecomprising SEQ ID NO:524 or SEQ ID NO:100.

The invention provides compositions or products of manufacturecomprising a mixture of enzymes comprising (a) SEQ ID NO:106, acellobiohydrolase I (CBH I), and a cellobiohydrolase II (CBH II); (b)the mixture of (a), wherein the CBHI is SEQ ID NO:46 or SEQ ID NO:34;(c) the mixture of (a) or (b), wherein the CBHII is SEQ ID NO:98; (d)the mixture of (a), (b), or (c), further comprising anarabinofuranosidase; (e) the mixture of (d), wherein thearabinofuranosidase is SEQ ID NO:92 and/or SEQ ID NO:104; (f) themixture of (a), (b), (c), (d) or (e), further comprising a xylanase; (g)the mixture of (f), wherein the xylanase is SEQ ID NO:100, SEQ ID NO:102or SEQ ID NO:524, or a combination thereof; (h) the mixture of (a), (b),(c), (d), (e), (f) or (g), further comprising an oligomerase; (i) themixture of (h), wherein the oligomerase is SEQ ID NO:520 or SEQ IDNO:522, or a combination thereof; (j) the mixture of (a), (b), (c), (d),(e), (f), (g), (h) or (i), further comprising at least one of SEQ IDNO:94, SEQ ID NO:96, SEQ ID NO:264, SEQ ID NO:440 or SEQ ID NO:442, or acombination thereof; or (k) the mixture of (a), (b), (c), (d), (e), (f),(g), (h), (i), or (j), further comprising an endoglucanase, whereinoptionally the endoglucanase comprises SEQ ID NO:108, SEQ ID NO:112, SEQID NO:114, or SEQ ID NO:116.

The invention provides methods for processing a biomass materialcomprising lignocellulose comprising contacting a composition comprisinga cellulose or a fermentable sugar with a polypeptide of the invention,or a polypeptide encoded by a nucleic acid of the invention, or anenzyme ensemble, product of manufacture or “cocktail” of the invention,wherein optionally the biomass material comprising lignocellulose isderived from an agricultural crop, is a byproduct of a food or a feedproduction, is a lignocellulosic waste product, or is a plant residue ora waste paper or waste paper product, and optionally the polypeptide hasactivity comprising cellulase, endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, arabinofuranosidase, and/oroligomerase activity, and optionally the plant residue comprise grain,seeds, stems, leaves, hulls, husks, corn cobs, corn stover, straw,grasses, wherein optionally grasses are Indian grass or switch grass,wood, wood chips, wood pulp and sawdust, and optionally the paper wastecomprises discarded or used photocopy paper, computer printer paper,notebook paper, notepad paper, typewriter paper, newspapers, magazines,cardboard and paper-based packaging materials, and optionally theprocessing of the biomass material generates a bioethanol.

The invention provides dairy products comprising a polypeptide of theinvention, or a polypeptide encoded by a nucleic acid of the invention,or an enzyme ensemble, product of manufacture or “cocktail” of theinvention, wherein optionally the dairy product comprises a milk, an icecream, a cheese or a yogurt, and optionally the polypeptide has activitycomprising oligomerase and/or cellulase, endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity.

The invention provides method for improving texture and flavor of adairy product comprising the following steps: (a) providing apolypeptide of the invention, or a polypeptide encoded by a nucleic acidof the invention, or an enzyme ensemble, product of manufacture or“cocktail” of the invention; (b) providing a dairy product; and (c)contacting the polypeptide of step (a) and the dairy product of step (b)under conditions wherein the cellulase can improve the texture or flavorof the dairy product.

The invention provides textiles or fabrics comprising a polypeptide ofthe invention, or a polypeptide encoded by a nucleic acid of theinvention, or an enzyme ensemble, product of manufacture or “cocktail”of the invention, wherein optionally the textile or fabric comprises acellulose-containing fiber, and optionally the polypeptide has activitycomprising oligomerase and/or cellulase, endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity.

The invention provides methods for treating solid or liquid animal wasteproducts comprising the following steps: (a) providing a polypeptide ofthe invention, or a polypeptide encoded by a nucleic acid of theinvention, or an enzyme ensemble, product of manufacture or “cocktail”of the invention, wherein optionally the polypeptide has activitycomprising oligomerase and/or cellulase, endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity; (b) providing a solidor a liquid animal waste; and (c) contacting the polypeptide of step (a)and the solid or liquid waste of step (b) under conditions wherein theprotease can treat the waste.

The invention provides processed waste products comprising a polypeptideof the invention, or a polypeptide encoded by a nucleic acid of theinvention, or an enzyme ensemble, product of manufacture or “cocktail”of the invention, wherein optionally the polypeptide has activitycomprising oligomerase and/or cellulase, endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity.

The invention provides disinfectants comprising a polypeptide havingoligomerase and/or cellulase activity, wherein the polypeptide comprisesa sequence of the invention, or a polypeptide encoded by a nucleic acidof the invention, or an enzyme ensemble, product of manufacture or“cocktail” of the invention, wherein optionally the polypeptide hasactivity comprising endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase activity.

The invention provides biodefense or bio-detoxifying agents comprising apolypeptide having oligomerase and/or cellulase activity, wherein thepolypeptide comprises a sequence of the invention, or a polypeptideencoded by a nucleic acid of the invention, or an enzyme ensemble,product of manufacture or “cocktail” of the invention, whereinoptionally the polypeptide has activity comprising endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity.

The invention provides compositions (including enzyme ensembles andproducts of manufacture of the invention) comprising a mixture ofhemicellulose- and cellulose-hydrolyzing enzymes, wherein thecellulose-hydrolyzing enzymes comprise at least one endoglucanase,cellobiohydrolase I, cellobiohydrolase II and β-glucosidase; and thehemicellulose-hydrolyzing enzymes comprise at least one xylanase,β-xylosidase and arabinofuranosidase. In one aspect, the endoglucanaseis EG1_CDCBM3 (SEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105) plus acarbohydrate binding domain), the cellobiohydrolase I (CBH I) is SEQ IDNO:34 (encoded by, e.g., SEQ ID NO:33) or SEQ ID NO:46 (encoded by,e.g., SEQ ID NO:45), the cellobiohydrolase II is SEQ ID NO:98 (encodedby, e.g., SEQ ID NO:97), the β-glucosidase is SEQ ID NO:94 (encoded by,e.g., SEQ ID NO:93), the xylanase is SEQ ID NO:100 (encoded by, e.g.,SEQ ID NO:99), SEQ ID NO:102 (encoded by, e.g., SEQ ID NO:101) or SEQ IDNO:524 (encoded by, e.g., SEQ ID NO:523), the β-xylosidase is SEQ IDNO:96 (encoded by, e.g., SEQ ID NO:95), the arabinofuranosidase is SEQID NO:92 (encoded by, e.g., SEQ ID NO:91) or SEQ ID NO:104 (encoded by,e.g., SEQ ID NO:103), or a combination thereof.

The invention provides compositions (including enzyme ensembles andproducts of manufacture of the invention) comprising (a) SEQ ID NO:106,SEQ ID NO:264, a cellobiohydrolase I (CBH I), a cellobiohydrolase II(CBH II), SEQ ID NO:100 or SEQ ID NO:524, SEQ ID NO:96, SEQ ID NO:92,SEQ ID NO:440 and SEQ ID NO:442; or (b) SEQ ID NO:106, SEQ ID NO:264,SEQ ID NO:34 or SEQ ID NO:46, SEQ ID NO:98, SEQ ID NO:100 or SEQ IDNO:524, SEQ ID NO:96, SEQ ID NO:92, SEQ ID NO:440, SEQ ID NO:442 and SEQID NO:102; (c) SEQ ID NO:98; SEQ ID NO:34 or SEQ ID NO:46; SEQ ID NO:94;SEQ ID NO:100 or SEQ ID NO:524; SEQ ID NO:102; SEQ ID NO:96; SEQ IDNO:92; and, SEQ ID NO:104; or, (d) the mixture of (a), (b) or (c)further comprising an endoglucanase, wherein optionally theendoglucanase comprises SEQ ID NO:108, SEQ ID NO:108, SEQ ID NO:112, SEQID NO:114, or SEQ ID NO:116.

The invention provides compositions (including enzyme ensembles andproducts of manufacture of the invention) comprising a mixture ofhemicellulose- and cellulose-hydrolyzing enzymes of the invention, and abiomass material, wherein optionally the biomass material comprises alignocellulosic material derived from an agricultural crop, or thebiomass material is a byproduct of a food or a feed production, or thebiomass material is a lignocellulosic waste product, or the biomassmaterial is a plant residue or a waste paper or waste paper product, orthe biomass material comprises a plant residue, and optionally the plantresidue comprises grains, seeds, stems; leaves, hulls, husks, corn cobs,corn stover, grasses, wherein optionally grasses are Indian grass orswitch grass, straw, wood, wood chips, wood pulp and/or sawdust, andoptionally the paper waste comprises discarded or used photocopy paper,computer printer paper, notebook paper, notepad paper, typewriter paper,newspapers, magazines, cardboard and paper-based packaging materials.

The invention provides methods for processing a biomass materialcomprising providing enzyme ensembles (“cocktails”) or products ofmanufacture of the invention, or a mixture of hemicellulose- andcellulose-hydrolyzing enzymes, wherein the cellulose-hydrolyzing enzymescomprise at least one endoglucanase, cellobiohydrolase I,cellobiohydrolase II and β-glucosidase; and thehemicellulose-hydrolyzing enzymes comprise at least one xylanase,β-xylosidase and arabinofuranosidase, and contacting the mixture ofenzymes with the biomass material, wherein optionally the biomassmaterial comprising lignocellulose is derived from an agricultural crop,is a byproduct of a food or a feed production, is a lignocellulosicwaste product, or is a plant residue or a waste paper or waste paperproduct, and optionally the polypeptide has activity comprisingcellulase, endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomeraseactivity, and optionally the plant residue comprise grains, seeds,stems, leaves, hulls, husks, corn cobs, corn stover, grasses, whereinoptionally grasses are Indian grass or switch grass, straw, wood, woodchips, wood pulp and sawdust, and optionally the paper waste comprisesdiscarded or used photocopy paper, computer printer paper, notebookpaper, notepad paper, typewriter paper, newspapers, magazines, cardboardand paper-based packaging materials, and optionally the processing ofthe biomass material generates a bioethanol. In one aspect, theendoglucanase is EG1_CDCBM3 (SEQ ID NO:106 (encoded by, e.g., SEQ IDNO:105) plus a carbohydrate binding domain), the cellobiohydrolase I isSEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) or SEQ ID NO:46 (encodedby, e.g., SEQ ID NO:45), the cellobiohydrolase II is SEQ ID NO:98(encoded by, e.g., SEQ ID NO:97), the β-glucosidase is SEQ ID NO:94(encoded by, e.g., SEQ ID NO:93), the xylanase is SEQ ID NO:100 (encodedby, e.g., SEQ ID NO:99) or SEQ ID NO:102 (encoded by, e.g., SEQ IDNO:101) or SEQ ID NO:524 (encoded by, e.g., SEQ ID NO:523), theβ-xylosidase is SEQ ID NO:96 (encoded by, e.g., SEQ ID NO:95), thearabinofuranosidase is SEQ ID NO:92 (encoded by, e.g., SEQ ID NO:91) orSEQ ID NO:104 (encoded by, e.g., SEQ ID NO:103), or a combinationthereof.

The invention provides compositions (including enzyme ensembles(“cocktails”) or products of manufacture of the invention) comprising amixture of enzymes comprising SEQ ID NO:106 (encoded by, e.g., SEQ IDNO:105), SEQ ID NO:264 (encoded by, e.g., SEQ ID NO:263), acellobiohydrolase I (CBH I), a cellobiohydrolase II (CBH II), SEQ IDNO:100 (encoded by, e.g., SEQ ID NO:99) or SEQ ID NO:524 (encoded by,e.g., SEQ ID NO:523), SEQ ID NO:96 (encoded by, e.g., SEQ ID NO:95), SEQID NO:92 (encoded by, e.g., SEQ ID NO:91), SEQ ID NO:440 (encoded by,e.g., SEQ ID NO:439) and SEQ ID NO:442 (encoded by, e.g., SEQ IDNO:441). In one aspect, the mixture of enzymes comprises SEQ ID NO:34(encoded by, e.g., SEQ ID NO:33), SEQ ID NO:98 (encoded by, e.g., SEQ IDNO:97) and SEQ ID NO:104 (encoded by, e.g., SEQ ID NO:103).

The invention provides methods for processing a biomass materialcomprising providing a mixture of enzymes of the invention (includingenzyme ensembles (“cocktails”) or products of manufacture of theinvention), and contacting the enzyme mixture with the biomass material,wherein optionally the biomass material comprising lignocellulose isderived from an agricultural crop, is a byproduct of a food or a feedproduction, is a lignocellulosic waste product, or is a plant residue ora waste paper or waste paper product, and optionally the polypeptide hasactivity comprising cellulase, endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase activity, and optionally the plant residue compriseseeds, stems, leaves, hulls, husks, corn cobs, corn stover, grasses,wherein optionally grasses are Indian grass or switch grass, grains,straw, wood, wood chips, wood pulp and sawdust, and optionally the paperwaste comprises discarded or used photocopy paper, computer printerpaper, notebook paper, notepad paper, typewriter paper, newspapers,magazines, cardboard and paper-based packaging materials, and optionallythe processing of the biomass material generates a bioethanol.

The invention provides chimeric polypeptides comprising a first domainand at least a second domain, wherein the first domain comprises anenzyme of the invention, and the second domain comprises a heterologousor modified carbohydrate binding domain or a heterologous or modifieddockerin domain, and optionally the carbohydrate binding domain is acellulose-binding module (CBM) or a lignin-binding domain, andoptionally the second domain appended approximate to the enzyme'scatalytic domain, and optionally the second domain appended approximateto the C-terminus of the enzyme's catalytic domain.

The invention provides compositions comprising an polypeptide having acellobiohydrolase I activity and a polypeptide havingarabinofuranosidase activity, wherein at least one polypeptide having acellobiohydrolase I activity is SEQ ID NO:34 (encoded by, e.g., SEQ IDNO:33).

The details of one or more aspects of the invention are set forth in theaccompanying drawings and the description below. Other features,objects, and advantages of the invention will be apparent from thedescription and drawings, and from the claims.

All publications, patents, patent applications, GenBank sequences andATCC deposits, cited herein are hereby expressly incorporated byreference for all purposes.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings are illustrative of aspects of the invention andare not meant to limit the scope of the invention as encompassed by theclaims.

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

FIG. 1 is a block diagram of a computer system.

FIG. 2 is a flow diagram illustrating one aspect of a process forcomparing a new nucleotide or protein sequence with a database ofsequences in order to determine the homology levels between the newsequence and the sequences in the database.

FIG. 3 is a flow diagram illustrating one aspect of a process in acomputer for determining whether two sequences are homologous.

FIG. 4 is a flow diagram illustrating one aspect of an identifierprocess 300 for detecting the presence of a feature in a sequence.

FIG. 5 is an illustration of the structure of cellobiose.

FIGS. 6, 7 and 8 are schematic illustrations of the enzymatically drivenpathway for digesting cellulose (FIGS. 6 and 7) and hemicellulose (FIG.8); as discussed in detail in Example 11, below.

FIG. 9 is a diagram illustrating the variety of likely mutations inpolypeptide that result from the introduction of single point mutationsin a polynucleotide encoding said polypeptide by a method such aserror-prone PCR. Because replicative errors in a polynucleotidesequence, such as those introduced using error-prone PCR are unlikely togenerate two—much less three—contiguous nucleotide changes, said methodsare unlikely to achieve more than 5-7 (on average) codon changes at eachcodon position. Illustrated is the poor ability of this approach forachieving all possible amino acid changes at each amino acid site alongthe polypeptide. In contrast, the gene site-saturation mutagenesis(GSSM) approach does achieve a range of codon substitutions (preferablycomprising the 32 codons represented by the degenerate cassette sequenceN,N,G/T) so as to achieve all possible amino acid changes at each aminoacid site along a polypeptide.

FIG. 10 illustrates in graphic form data showing a typical GIGAMATRIX™breakout, where active clones expressing enzyme able to hydrolyzemethylumbelliferyl cellobioside are identified, as discussed in detailin Example 4, below.

FIG. 11 is a diagram illustrating the use of a gene site-saturationmutagenesis (GSSM) approach for achieving all possible amino acidchanges at each amino acid site along the polypeptide.

FIG. 12 is an overview of the GSSM process.

FIG. 13 is an overview of the Gene Discovery & DIRECTEVOLUTION®Technology (Diversa Corporation, San Diego, Calif.) used to develop andpractice the invention, as described herein.

FIG. 14 is a technology comparison illustrating the difference betweenscreening using the GIGAMATRIX™ Platform vs. traditional 384-wellplates.

FIG. 15 illustrates the size of a single well from a 96-well plate (8 mmdiameter (dia.)) compared to approx. 1,000 GIGAMATRIX™ wells (approx.0.2 mm dia.).

FIG. 16 illustrates in graphic form data from the enzymatic treatment ofpretreated corn cob, as discussed in detail in Example 5, below.

FIG. 17 illustrates data from the digestion of alkaline pretreated cornstover using 3 different concentrations of an exemplary endoglucanase ofthe invention; product release (cellobiose and glucose) was monitoredover time using an HPLC method, as discussed in detail in Example 7,below.

FIG. 18 illustrates data demonstrating xylose release from high severityalkaline pretreated corn stover (alkPCS) by the exemplary xylanase ofthe invention as discussed in detail in Example 7, below.

FIG. 19 illustrates data demonstrating digestion of high severityalkaline pretreated corn stover (alkPCS) by exemplary enzymes of theinvention, as discussed in detail in Example 7, below.

FIGS. 20A-20B illustrate data showing both rate and extent of glucoserelease using combinations of cellobiohydrolase I (CBH I) (FIG. 20A) andcellobiohydrolase II (CBH II) (FIG. 20B) with an exemplary xylanase andan exemplary endoglucanase, as discussed in detail in Example 7, below.

FIG. 21 illustrates data showing the release of glucose at 48 h frompretreated corn stover samples by 20 different endoglucanases, asdiscussed in detail in Example 5, below.

FIG. 22 illustrates data showing temperature and pH optima of 76β-glucosidases on p-nitrophenyl-β-glucopyranoside, as discussed indetail in Example 5, below.

FIG. 23 illustrates data showing the digestion of high severity alkalinepretreated corn stover (PCS) by three different enzyme loads of anexemplary xylanase, as discussed in detail in Example 5, below.

FIG. 24 illustrates data from the hydrolysis of xylobiose by eightxylosidases at either 50° C. or 37° C., as discussed in detail inExample 5, below.

FIG. 25 illustrates data showing the release of xylose and arabinosefrom high severity alkaline pretreated corn stover (PCS) by combinationsof xylanase, xylosidase and arabinofuranosidase, as discussed in detailin Example 5, below.

FIG. 26 illustrates data showing the performance of exemplary enzymecocktails of the invention on low severity alkPCS and alkalinepretreated cobs, as discussed in detail in Example 5, below.

FIG. 27 in table form compares data from SPEZYME® cellulase theexemplary enzyme cocktail of the invention E9 on four differentpretreated corn samples, as discussed in detail in Example 5, below.

FIG. 28 in table form sets forth data of specific activity of EGs onsoluble cellulose substrate carboxylmethyl cellulose (CMC) at determinedat 37° C., pH 7.0; as discussed in detail in Example 8, below.

FIGS. 29A and 29B illustrate the hydrolysis of AVICEL® by exemplary EGsunder normalized conditions at 60° C. (FIG. 29A) and 80° C. (FIG. 29B);as discussed in detail in Example 8, below.

FIG. 30 graphically illustrates data showing the pH and temperatureoptima of exemplary enzymes on AVICEL® MCC; as discussed in detail inExample 8, below.

FIG. 31 graphically illustrates data showing the pH and temperatureoptima of 89 β-glucosidases of the invention; as discussed in detail inExample 9, below.

FIG. 32 illustrates phylogenetic trees of CBH genes of the invention; asdiscussed in detail in Example 9, below.

FIG. 33 illustrates an SDS PAGE of a crude cell extract and an enriched(recombinant) exemplary β-glucosidase activity of the inventionfollowing anion exchange chromatography; as discussed in detail inExample 5, below.

FIG. 34 is an illustration of an SDS PAGE of the crude cell extract andthe enriched exemplary β-glucosidase following anion exchangechromatography; as discussed in detail in Example 5, below.

FIG. 35 is an illustration of an SDS PAGE of the crude cell extract andan enriched exemplary xylanase enzyme of the invention following cationexchange chromatography; as discussed in detail in Example 5, below.

FIG. 36 is an illustration of an SDS PAGE of the crude cell extract andan enriched exemplary xylanase enzyme of the invention following cationexchange chromatography; as discussed in detail in Example 5, below.

FIG. 37 is an illustration of an SDS PAGE of a crude cell extract and anenriched exemplary enzyme of the invention having β-xylosidase activityfollowing anion exchange chromatography; as discussed in detail inExample 5, below.

FIG. 38 is an illustration of an SDS PAGE of a crude cell extract andenriched exemplary enzyme of the invention having arabinofuranosidaseactivity following anion exchange chromatography; as discussed in detailin Example 5, below.

FIG. 39 is an illustration of an SDS-PAGE of an exemplary enzyme of theinvention having cellobiohydrolase activity enriched on a PAPC affinityligand; as discussed in detail in Example 5, below.

FIG. 40 is an illustration of an SDS-PAGE of the exemplary family 7cellobiohydrolase of the invention enriched on size exclusionchromatography; as discussed in detail in Example 5, below.

FIG. 41 is an illustration of a chromatogram that is a representativeHPLC trace of the products of biomass digestion by exemplary enzymes ofthe invention; as discussed in detail in Example 5, below.

FIG. 42 graphically illustrates data obtained from cellulase digestionusing exemplary enzymes of the invention using 5% solids in bothabsolute concentration and percent conversion; as discussed in detail inExample 5, below.

FIG. 43 graphically illustrates data from the digestion of 10% solidsusing a commercial cellulose plus 7.5 “FPU equivalents”/g commercialxylanase; as discussed in detail in Example 5, below.

FIG. 44 and FIG. 45 each graphically illustrates data demonstratingglucose release from a 5% cellulose solids composition catalyzed bythree different exemplary enzyme cocktails of the invention; asdiscussed in detail in Example 5, below.

FIG. 46 graphically illustrates data demonstrating the digestion of a10% cellulose solids composition using 58 mg of the exemplary “E9cocktail” per gram cellulose; as discussed in detail in Example 5;below.

FIG. 47 graphically illustrates data demonstrating the time courses forglucose appearance using 18.1 mg of the exemplary enzyme cocktail “E8”per gram cellulose and 1, 5 and 10% solids—pretreated corn cob; asdiscussed in detail in Example 5, below.

FIG. 48 graphically illustrates data demonstrating time courses forglucose appearance using 18.1 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids—pretreated corn cob; as discussed indetail in Example 5, below.

FIG. 49 graphically illustrates data demonstrating time courses forglucose appearance using 9 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids—pretreated corn cob; as discussed indetail in Example 5, below.

FIG. 50 graphically illustrates data demonstrating time courses forglucose appearance using 9 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids (pretreated corn cob); as discussed indetail in Example 5, below.

FIG. 51 graphically illustrates data demonstrating time courses forxylose appearance using 18 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids pretreated corn cob; as discussed indetail in Example 5, below.

FIG. 52 graphically illustrates data demonstrating time courses forxylose appearance using 18 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids pretreated corn cob; as discussed indetail in Example 5, below.

FIG. 53 graphically illustrates data demonstrating time courses forxylose appearance using 9 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids pretreated corn cob; as discussed indetail in Example 5, below.

FIG. 54 graphically illustrates data demonstrating time courses forxylose appearance using 9 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids pretreated corn cob; as discussed indetail in Example 5, below.

FIGS. 55 and 56 in chart form summarize the data shown in FIGS. 47 to 50(glucose) and FIGS. 51 to 54 (xylose); as discussed in detail in Example5, below.

FIGS. 57, 58 and 59 summarize in table form the compositions ofexemplary enzyme mixes of the invention: FIG. 57 (Case 1—CBH I/CBH II),FIG. 58 (Case 2—CBH I/SEQ ID NO:98 (encoded by, e.g., SEQ ID NO:97)),and FIG. 59 (Case 3—SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33)/SEQ IDNO:98 (encoded by, e.g., SEQ ID NO:97)); as discussed in detail inExample 10, below.

FIG. 60 graphically illustrates data demonstrating glucose release from5% solids pretreated corn cob catalyzed by three different exemplary E8cocktails; as discussed in detail in Example 10, below.

FIG. 61 graphically illustrates data demonstrating xylose release from5% solids pretreated corn cob catalyzed by three different exemplary E8cocktails; as discussed in detail in Example 10, below.

FIGS. 62 and 63 are schematic illustrations of the enzymatically drivenpathway for digesting cellulose (FIG. 63) and hemicellulose (FIG. 62);as discussed in detail in Example 11, below.

FIG. 64 data summarizes studies of pH and temperature optima of variousenzymes on microcrystalline cellulose; as discussed in detail in Example11, below.

FIG. 65 graphically illustrates data showing the reaction time coursesof two exemplary enzymes of the invention with a microcrystallinecellulose and phosphoric acid swollen cellulose; as discussed in detailin Example 11, below.

FIG. 66 graphically illustrates data of studies-showing glucoseequivalent release from high, medium and low severity alkPCS by variousendoglucanases (EGs) of the invention; as discussed in detail in Example11, below.

FIG. 67 graphically illustrates data of studies showing the dosedependence of an exemplary enzyme of the invention; as discussed indetail in Example 11, below.

FIG. 68 illustrates a schematic of an exemplary automated system of theinvention developed to screen large numbers of enzymes and substrates;as discussed in detail in Example 11, below.

FIG. 69 graphically illustrates data of studies showing productdetection methods using an exemplary assay of the invention comprisinguse of a “BCA” (bicinchoninic acid) reducing sugar assay; as discussedin detail in Example 11, below.

FIG. 70 graphically illustrates data of studies showing the results ofrobotic methods of the invention wherein thousands of assay reactionsper day were carried out, the assay comprising use of alkaline PCS and aseries of endoglucanases of the invention; as discussed in detail inExample 11, below.

FIG. 71 illustrates data from an HPLC separation of sugar monomersfollowing enzymatic digestion of alkPCS; as discussed in detail inExample 11, below.

FIG. 72 summarizes the capillary electrophoresis separation ofcello-oligosaccharides from cellobiose to cellohexaose; as discussed indetail in Example 11, below.

FIG. 73 summarizes the capillary electrophoresis separation ofcello-oligosaccharides from cellobiose to cellohexaose; as discussed indetail in Example 11, below.

FIG. 74 graphically illustrates a Michaelis-Menten plot of activity ofan exemplary enzyme with the substrate cellobiose; as discussed indetail in Example 11, below.

FIG. 75 illustrates an SDS-PAGE analysis, and FIG. 76 graphicallyillustrates activity assays, of culture broths from a 30 L fermentor toshow accumulation of protein and activity; as discussed in detail inExample 11, below.

FIGS. 77A and 77B graphically illustrate data showing the effect ofcellulose hydrolysis by combining an exemplary xylanase of the inventionwith an exemplary endoglucanase of the invention (FIG. 77A) or (FIG.77B); as discussed in detail in Example 11, below.

FIGS. 78A and 78B graphically illustrate data showing the effect ofcellulose hydrolysis using an enzyme mixture of the invention: made bycombining an exemplary xylanase of the invention, an exemplaryendoglucanase of the invention, an exemplary β-glucosidase of theinvention and a CBHI (FIG. 78A) or a CBHII (FIG. 78B).

FIG. 79 graphically illustrates data showing the time courses for threedifferent enzyme loadings with xylan as a substrate, and monitoringxylose and xylobiose as products by HPLC-RI and converting data to“xylose equivalents”; as discussed in detail in Example 11, below.

FIG. 80 graphically illustrates data showing pH and temperature optimaof the screened β-glucosidases; as discussed in detail in Example 12,below.

FIG. 81 graphically illustrates data showing glucose inhibition of anexemplary enzyme; as discussed in detail in Example 12, below.

FIG. 82 graphically illustrates data showing digestion of phosphoricacid swollen cellulose (PASC) by recombinant C. heterostrophus strainscomprising nucleic acids of the invention; as discussed in detail inExample 12, below.

FIGS. 83A and 83B graphically illustrate data showing activity ofexemplary enzymes of the invention is dependent upon number of days in ashake flask; FIG. 83A—PASC activity of 5 different family 6 CBHcontaining strains during growth in 500 mL shake flasks; FIG. 83B—PASCactivity of 4 different family 7 CBH containing strains during growth in500 mL shake flasks, as discussed in detail in Example 12, below.

FIG. 84 graphically illustrates data showing the progression of percentconversion as different enzymes of the invention were combined; and thefigure describes exemplary enzyme mixtures of the invention, e.g., E10,E9, etc.; the figure graphically illustrates the improvement in glucoseand xylose conversion as enzymes of the invention are added to thecocktail; as discussed in detail in Example 12, below.

FIGS. 85A, 85B and 85C graphically illustrate digestion of pretreatedbiomass feedstocks by SPEZYME® enzyme and the exemplary enzyme mix ofthe invention designated “E9”, by showing the amount of sugar releasedat 48 hrs; FIG. 85A, glucose released; FIG. 85B, xylose released; FIG.85C, arabinose released; as discussed in detail in Example 12, below.

FIG. 86 graphically illustrates data showing glucose release from 5%solids pretreated corn cob (5 wt %) during incubation with the exemplary“E8” cocktail supplemented with either T. reesei CBH I and II orexemplary enzymes of the invention; as discussed in detail in Example12, below.

FIG. 87 and FIG. 88 graphically illustrate data showing digestion ofJaygo 2 (5% solids and 10% solids, respectively) using the celluloseSPEZYME® cellulase plus MULTIFECT® xylanase; as discussed in detail inExample 12, below.

FIG. 89 and FIG. 90 graphically illustrate data showing glucose releasefrom Jaygo 2 (5% solids) catalyzed by three different exemplary enzymemixes of the invention; as discussed in detail in Example 12, below.

FIG. 91 graphically illustrates data showing the digestion of Jaygo 2(10% solids) using the exemplary “E9” enzyme mix of the invention oncellulose; as discussed in detail in Example 12, below.

FIG. 92 graphically illustrates the level of conversion of glucose (G1)and xylose (X1) using 10% solids Jaygo 2 and a number of enzyme recipesthat vary in cellulase and hemicellulase content; as discussed in detailin Example 12, below.

FIG. 93A graphically illustrates percent glucose conversion usingexemplary enzyme mixes of the invention and solids (Jaygo 2) loadings.

FIG. 93B graphically illustrates percent xylose conversion at 48 hrsusing exemplary enzyme mixes of the invention and solids (Jaygo 2)loadings; as discussed in detail in Example 12, below.

FIG. 94A graphically illustrates xylose release from low severity alkPCS(2.2% solids) by exemplary xylosidase, xylanase and arabinofuranosidaseof the invention; as discussed in detail in Example 12, below.

FIG. 94B graphically illustrates arabinose release from low severityalkPCS (2.2% solids) by exemplary xylosidase, xylanase andarabinofuranosidase of the invention; as discussed in detail in Example12, below.

FIGS. 95A and 95B illustrate chromatograms of the results of using theexemplary enzyme mix of the invention “E10 cocktail” to digest Jaygo 2(5% solids) after 48 hr incubation (FIG. 95A) and subsequent acidhydrolysis of those liquors (FIG. 95B); as discussed in detail inExample 12, below.

FIG. 96 illustrates an HPLC of fractionated E10 enzyme mix-derivedsaccharification liquors; as discussed in detail in Example 12, below.

FIG. 97 illustrates the results of a capillary electrophoresis of thefractionation of E10 enzyme mix-digested saccharification liquors (upperpanel); the lower panel contains standard mono- and oligosaccharides; asdiscussed in detail in Example 12, below.

FIG. 98 illustrates an SDS-PAGE was done on the 6 pectinases tested inexemplary enzyme cocktails of the invention; as discussed in detail inExample 12, below.

FIG. 99 illustrates an SDS-PAGE of six β-glucosidases tested inexemplary E8 cocktails of the invention; as discussed in detail inExample 12, below.

FIG. 100 is an illustration of an HPLC-RI trace of the unfractionatedsaccharification liquors showing the recalcitrant oligosaccharides (F2),cellobiose (CB), glucose (G), xylose (X) and arabinose (A); as discussedin detail in Example 12, below.

FIG. 101 illustrates an HPLC-RI trace of the fractionatedsaccharification liquors showing the recalcitrant oligosaccharides (F2),cellobiose (CB), glucose (G), xylose (X) and arabinose (A); as discussedin detail in Example 12, below.

FIG. 102 illustrates an HPLC-RI trace of the sample shown in FIG. 101with exemplary enzymes of the invention; as discussed in detail inExample 12, below.

FIG. 103 illustrates an HPLC-RI trace of the sample shown in FIG. 102with the exemplary arabinofuranosidase; as discussed in detail inExample 12, below.

FIG. 104 illustrates an HPLC analysis of the digestion of fractionatedsoluble oligomers (AX₃) by the exemplary enzymes of the invention; asdiscussed in detail in Example 12, below.

FIG. 105 illustrates data showing digestion of fractionated solubleoligomers (AX₃) by an E8 cocktail of the invention comprising theexemplary enzymes SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) and SEQID NO:98 (encoded by, e.g., SEQ ID NO:97); the top panel is substrateonly and the bottom panel is after 14 hr enzyme incubation; as discussedin detail in Example 12, below.

FIG. 106 graphically illustrated the hydrolysis of PASC by secretedenzyme SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) of various geneknockouts.

FIG. 107 illustrates the product profile from a 48 hr saccharificationof Jaygo2 by an exemplary enzyme mix of the invention “E8” comprising T.reesei CBH I and II and two enzymes of the invention; as discussed indetail in Example 12, below.

FIG. 108 graphically illustrates enzyme progress curves comparingexemplary enzyme “E8” cocktails (SEQ ID NO:34 (encoded by, e.g., SEQ IDNO:33)/SEQ ID NO:98 (encoded by, e.g., SEQ ID NO:97)) with or withoutSEQ ID NO:104 (encoded by, e.g., SEQ ID NO:103); as discussed in detailin Example 12, below.

FIG. 109 graphically illustrates the product profile of the exemplaryenzyme cocktail “E8”, containing CBH I, CBHII and EG1_CDCBM3 (SEQ IDNO:106 (encoded by, e.g., SEQ ID NO:105) plus a carbohydrate bindingdomain); as discussed in detail in Example 12, below.

FIG. 110 illustrates the results of a capillary electrophoresis of APTSlabeled arabinoxylan fragments, where #1, 2, and 3 are standardmolecules while #5 and 6 are molecules isolated from saccharifiedliquors; and

FIG. 111 illustrates the results of a ¹³C NMR spectra of arabinoxylanfragments; as discussed in detail in Example 12, below.

FIG. 112 illustrates secreted protein (enzyme) activity against thesubstrate 4-MU-cellobioside of 74 mutagenized Cochliobolus strains; asdiscussed in detail in Example 12, below.

FIG. 113 illustrates secreted protein activity of mutagenizedCochliobolus strains using the substrate 4-MU-cellobioside; as discussedin detail in Example 12, below.

FIG. 114 illustrates functional and quantitative data from SEQ ID NO:34(encoded by, e.g., SEQ ID NO:33) mutant expression and activity studies,wherein the enzymes are expressed in Cochliobolus; as discussed indetail in Example 12, below.

FIG. 115 is an illustration of Western blots of specific wells from FIG.114, where SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) was grown in amicrotiter well plate and enzyme activity assayed on the substrate PASC;as discussed in detail in Example 12, below.

FIG. 116 is an illustration of unaligned electrophoretograms from 48channels from a 96 channel MegaBACE™ instrument; as discussed in detailin Example 12, below.

FIG. 117 shows data reconfirming the high protein expression andactivity of various “over-expressing” Cochliobolus strains; as discussedin detail in Example 12, below.

FIG. 118 is an illustration of an SDS-PAGE of secreted proteins ofvarious “over-expressing” Cochliobolus strains; as discussed in detailin Example 12, below.

FIG. 119 illustrates an SDS-PAGE of secreted proteins of 10 individualtransformants of exemplary enzymes SEQ ID NO:98 (encoded by, e.g., SEQID NO:97) and SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) inAspergillus; as discussed in detail in Example 12, below.

FIG. 120 illustrates an SDS-PAGE of Aspergillus- andCochliobolus-produced exemplary enzymes of the invention, SEQ ID NO:34(encoded, e.g., by SEQ ID NO:33) and SEQ ID NO:98 (encoded, e.g., by SEQID NO:97), to compare their production in the two exemplary cellexpression systems; as discussed in detail in Example 12, below.

FIG. 121 illustrates time course studies of polysaccharide hydrolysisreactions using exemplary enzyme cocktails of the invention, asdiscussed in detail, below.

FIGS. 122A and 122B, illustrate time course studies of polysaccharidehydrolysis reactions using exemplary enzyme cocktails of the invention,as discussed in detail, below.

FIG. 123 graphically illustrates data showing the percent xylanconversion over time for the exemplary cocktails of the invention, asdiscussed in detail in Example 13, below.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION OF THE INVENTION

In one aspect, the invention provides polypeptides having anycellulolytic activity, e.g., a cellulase activity, e.g., endoglucanase,cellobiohydrolase, mannanase and/or β-glucosidase activity,polynucleotides encoding these polypeptides, and methods of making andusing these polynucleotides and polypeptides. In one aspect, theinvention provides polypeptides having an oligomerase activity, e.g.,enzymes that convert soluble oligomers to fermentable monomeric sugarsin the saccharification of biomass, e.g., where the activity comprisesenzymatic hydrolysis of (to degrade) soluble cellooligsaccharides andarabinoxylan oligomers into monomer xylose, arabinose and glucose; andpolynucleotides encoding these enzymes, and making and using thesepolynucleotides and polypeptides. In one aspect, the invention providesthermostable and thermotolerant forms of polypeptides of the invention.The polypeptides of the invention can be used in a variety ofpharmaceutical, agricultural and industrial contexts.

In one aspect, the invention provides a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase, with an increased catalyticrate, improving the process of substrate hydrolysis. This increasedefficiency in catalytic rate leads to an increased efficiency inproducing sugars that will subsequently be used by microorganisms forethanol production. In one aspect, microorganisms generating enzyme ofthe invention are used with ethanol-producing microorganisms. Thus, theinvention provides methods for ethanol production and making “cleanfuels” based on ethanol, e.g., for transportation using bioethanol.

In one aspect the invention provides compositions (e.g., enzymepreparations, feeds, drugs, dietary supplements) comprising the enzymes,polypeptides or polynucleotides of the invention. These compositions canbe formulated in a variety of forms, e.g., as liquids, gels, pills,tablets, sprays, powders, food, feed pellets or encapsulated forms,including nanoencapsulated forms.

Assays for measuring cellulase activity, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity, e.g., for determiningif a polypeptide has cellulase activity, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity, are well known in theart and are within the scope of the invention; see, e.g., Baker W L,Panow A, Estimation of cellulase activity using a glucose-oxidase-Cu(II)reducing assay for glucose, J Biochem Biophys Methods. 1991 December,23(4):265-73; Sharrock K R, Cellulase assay methods: a review, J BiochemBiophys Methods. 1988 Oct. 17(2):81-105; Carder J H, Detection andquantitation of cellulase by Congo red staining of substrates in acup-plate diffusion assay, Anal Biochem. 1986 Feb. 15, 153(1):75-9;Canevascini G., A cellulase assay coupled to cellobiose dehydrogenase,Anal Biochem. 1985 June, 147(2):419-27; Huang J S, Tang J, Sensitiveassay for cellulase and dextranase. Anal Biochem. 1976 June,73(2):369-77.

The pH of reaction conditions utilized by the invention is anothervariable parameter for which the invention provides. In certain aspects,the pH of the reaction is conducted in the range of about 3.0 to about9.0. In other aspects, the pH is about 4.5 or the pH is about 7.5 or thepH is about 9. Reaction conditions conducted under alkaline conditionsalso can be advantageous, e.g., in some industrial or pharmaceuticalapplications of enzymes of the invention.

The invention provides cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase polypeptides of the invention ina variety of forms and formulations. In the methods of the invention,cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase polypeptides of the invention are used in a variety of formsand formulations. For example, purified cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase polypeptides can be used inenzyme preparations deployed in bioethanol production or inpharmaceutical or dietary aid applications. Alternatively, the enzymesof the invention can be used directly in processes to producebioethanol, make clean fuels, process biowastes, process foods, liquidsor feeds, and the like.

Alternatively, the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase polypeptides of the invention can be expressed in amicroorganism using procedures known in the art. In other aspects, thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase polypeptides of the invention can be immobilized on a solidsupport prior to use in the methods of the invention. Methods forimmobilizing enzymes on solid supports are commonly known in the art,for example J. Mol. Cat. B: Enzymatic 6 (1999) 29-39; Chivata et al.Biocatalysis: Immobilized cells and enzymes, J Mol. Cat. 37 (1986) 1-24:Sharma et al., Immobilized Biomaterials Techniques and Applications,Angew. Chem. Int. Ed. Engl. 21 (1982) 837-54: Laskin (Ed.), Enzymes andImmobilized Cells in Biotechnology.

Nucleic Acids, Probes and Inhibitory Molecules

The invention provides isolated and recombinant nucleic acids, e.g., seeTables 1, 2, and 3, Examples 1 and 4, below, and Sequence Listing;nucleic acids encoding polypeptides, including the exemplarypolynucleotide sequences of the invention, e.g., see Table 1 andSequence Listing; including expression cassettes such as expressionvectors and various cloning vehicles comprising nucleic acids of theinvention. The invention also includes methods for discovering,identifying or isolated new cellulases, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase polypeptide sequences using thenucleic acids of the invention. The invention also includes methods forinhibiting the expression of cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase encoding genes and transcriptsusing the nucleic acids of the invention.

Also provided are methods for modifying the nucleic acids of theinvention, including making variants of nucleic acids of the invention,by, e.g., synthetic ligation reassembly, optimized directed evolutionsystem and/or saturation mutagenesis such as gene site saturationmutagenesis (GSSM). The term “saturation mutagenesis”, Gene SiteSaturation Mutagenesis, or “GSSM” includes a method that uses degenerateoligonucleotide primers to introduce point mutations into apolynucleotide, as described in detail, below. The term “optimizeddirected evolution system” or “optimized directed evolution” includes amethod for reassembling fragments of related nucleic acid sequences,e.g., related genes, and explained in detail, below. The term “syntheticligation reassembly” or “SLR” includes a method of ligatingoligonucleotide fragments in a non-stochastic fashion, and explained indetail, below. The term “variant” refers to polynucleotides orpolypeptides of the invention modified at one or more base pairs,codons, introns, exons, or amino acid residues (respectively) yet stillretain the biological activity of a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase of the invention. Variants canbe produced by any number of means included methods such as, forexample, error-prone PCR, shuffling, oligonucleotide-directedmutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis,cassette mutagenesis, recursive ensemble mutagenesis, exponentialensemble mutagenesis, site-specific mutagenesis, gene reassembly, GSSMand any combination thereof.

The nucleic acids of the invention can be made, isolated and/ormanipulated by, e.g., cloning and expression of cDNA libraries,amplification of message or genomic DNA by PCR, and the like. Forexample, exemplary sequences of the invention were initially derivedfrom environmental sources. Thus, in one aspect, the invention providescellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme-encoding nucleic acids, and the polypeptides encodedby them, having a common novelty in that they are derived from a commonsource, e.g., an environmental, mixed culture, or a bacterial source.

In practicing the methods of the invention, homologous genes can bemodified by manipulating a template nucleic acid, as described herein.The invention can be practiced in conjunction with any method orprotocol or device known in the art, which are well described in thescientific and patent literature.

The phrases “nucleic acid” or “nucleic acid sequence” as used hereinrefer to an oligonucleotide, nucleotide, polynucleotide, or to afragment of any of these, to DNA or RNA of genomic or synthetic originwhich may be single-stranded or double-stranded and may represent asense or antisense (complementary) strand, to peptide nucleic acid(PNA), or to any DNA-like or RNA-like material, natural or synthetic inorigin. The phrases “nucleic acid” or “nucleic acid sequence” includesoligonucleotide, nucleotide, polynucleotide, or to a fragment of any ofthese, to DNA or RNA (e.g., mRNA, rRNA, tRNA, iRNA) of genomic orsynthetic origin which may be single-stranded or double-stranded and mayrepresent a sense or antisense strand, to peptide nucleic acid (PNA), orto any DNA-like or RNA-like material, natural or synthetic in origin,including, e.g., iRNA, ribonucleoproteins (e.g., e.g., double strandediRNAs, e.g., iRNPs). The term encompasses nucleic acids, i.e.,oligonucleotides, containing known analogues of natural nucleotides. Theterm also encompasses nucleic-acid-like structures with syntheticbackbones, see e.g., Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197;Strauss-Soukup (1997) Biochemistry 36:8692-8698; Samstag (1996)Antisense Nucleic Acid Drug Dev 6:153-156. “Oligonucleotide” includeseither a single stranded polydeoxynucleotide or two complementarypolydeoxynucleotide strands which may be chemically synthesized. Suchsynthetic oligonucleotides have no 5′ phosphate and thus will not ligateto another oligonucleotide without adding a phosphate with an ATP in thepresence of a kinase. A synthetic oligonucleotide can ligate to afragment that has not been dephosphorylated.

A “coding sequence of” or a “nucleotide sequence encoding” a particularpolypeptide or protein, is a nucleic acid sequence which is transcribedand translated into a polypeptide or protein when placed under thecontrol of appropriate regulatory sequences. The term “gene” means thesegment of DNA involved in producing a polypeptide chain; it includesregions preceding and following the coding region (leader and trailer)as well as, where applicable, intervening sequences (introns) betweenindividual coding segments (exons). A promoter sequence is “operablylinked to” a coding sequence when RNA polymerase which initiatestranscription at the promoter will transcribe the coding sequence intomRNA. “Operably linked” as used herein refers to a functionalrelationship between two or more nucleic acid (e.g., DNA) segments. Itcan refer to the functional relationship of transcriptional regulatorysequence to a transcribed sequence. For example, a promoter is operablylinked to a coding sequence, such as a nucleic acid of the invention, ifit stimulates or modulates the transcription of the coding sequence inan appropriate host cell or other expression system. Generally, promotertranscriptional regulatory sequences that are operably linked to atranscribed sequence are physically contiguous to the transcribedsequence, i.e., they are cis-acting. However, some transcriptionalregulatory sequences, such as enhancers, need not be physicallycontiguous or located in close proximity to the coding sequences whosetranscription they enhance.

The term “expression cassette” as used herein refers to a nucleotidesequence which is capable of affecting expression of a structural gene(i.e., a protein coding sequence, such as a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention) in a host compatible with such sequences. Expressioncassettes include at least a promoter operably linked with thepolypeptide coding sequence; and, optionally, with other sequences,e.g., transcription termination signals. Additional factors necessary orhelpful in effecting expression may also be used, e.g., enhancers,alpha-factors. Thus, expression cassettes also include plasmids,expression vectors, recombinant viruses, any form of recombinant “nakedDNA” vector, and the like. A “vector” comprises a nucleic acid which caninfect, transfect, transiently or permanently transduce a cell. It willbe recognized that a vector can be a naked nucleic acid, or a nucleicacid complexed with protein or lipid. The vector optionally comprisesviral or bacterial nucleic acids and/or proteins, and/or membranes(e.g., a cell membrane, a viral lipid envelope, etc.). Vectors include,but are not limited to replicons (e.g., RNA replicons, bacteriophages)to which fragments of DNA may be attached and become replicated. Vectorsthus include, but are not limited to RNA, autonomous self-replicatingcircular or linear DNA or RNA (e.g., plasmids, viruses, and the like,see, e.g., U.S. Pat. No. 5,217,879), and include both the expression andnon-expression plasmids. Where a recombinant microorganism or cellculture is described as hosting an “expression vector” this includesboth extra-chromosomal circular and linear DNA and DNA that has beenincorporated into the host chromosome(s). Where a vector is beingmaintained by a host cell, the vector may either be stably replicated bythe cells during mitosis as an autonomous structure, or is incorporatedwithin the host's genome.

As used herein, the term “recombinant” encompasses nucleic acidsadjacent to a “backbone” nucleic acid to which it is not adjacent in itsnatural environment. In one aspect, to be “enriched” the nucleic acidswill represent about 5% or more of the number of nucleic acid inserts ina population of nucleic acid backbone molecules. Backbone moleculesaccording to the invention include nucleic acids such as expressionvectors, self-replicating nucleic acids, viruses, integrating nucleicacids and other vectors or nucleic acids used to maintain or manipulatea nucleic acid insert of interest. In one aspect, the enriched nucleicacids represent about 15% or more of the number of nucleic acid insertsin the population of recombinant backbone molecules. In one aspect, theenriched nucleic acids represent about 50% or more of the number ofnucleic acid inserts in the population of recombinant backbonemolecules. In a one aspect, the enriched nucleic acids represent about90% or more of the number of nucleic acid inserts in the population ofrecombinant backbone molecules.

One aspect of the invention is an isolated, synthetic or recombinantnucleic acid comprising one of the sequences of the invention, or afragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100,150, 200, 300, 400, or 500 or more consecutive bases of a nucleic acidof the invention. The isolated, synthetic or recombinant nucleic acidsmay comprise DNA, including cDNA, genomic DNA and synthetic DNA. The DNAmay be double-stranded or single-stranded and if single stranded may bethe coding strand or non-coding (anti-sense) strand. Alternatively; theisolated, synthetic or recombinant nucleic acids comprise RNA.

The isolated, synthetic or recombinant nucleic acids of the inventionmay be used to prepare one of the polypeptides of the invention, orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 or more consecutive amino acids of one of the polypeptidesof the invention. Accordingly, another aspect of the invention is anisolated, synthetic or recombinant nucleic acid which encodes one of thepolypeptides of the invention, or fragments comprising at least 5, 10,15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more consecutive aminoacids of one of the polypeptides of the invention. The coding sequencesof these nucleic acids may be identical to one of the coding sequencesof one of the nucleic acids of the invention or may be different codingsequences which encode one of the of the invention having at least 5,10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more consecutiveamino acids of one of the polypeptides of the invention, as a result ofthe redundancy or degeneracy of the genetic code. The genetic code iswell known to those of skill in the art and can be obtained, e.g., onpage 214 of B. Lewin, Genes VI, Oxford University Press, 1997.

The nucleic acids encoding polypeptides of the invention include but arenot limited to: the coding sequence of a nucleic acid of the inventionand additional coding sequences, such as leader sequences or proproteinsequences and non-coding sequences, such as introns or non-codingsequences 5′ and/or 3′ of the coding sequence. Thus, as used herein, theterm “polynucleotide encoding a polypeptide” encompasses apolynucleotide which includes the coding sequence for the polypeptide aswell as a polynucleotide which includes additional coding and/ornon-coding sequence.

In one aspect, the nucleic acid sequences of the invention aremutagenized using conventional techniques, such as site directedmutagenesis, or other techniques familiar to those skilled in the art,to introduce silent changes into the polynucleotides o of the invention.As used herein, “silent changes” include, for example, changes which donot alter the amino acid sequence encoded by the polynucleotide. Suchchanges may be desirable in order to increase the level of thepolypeptide produced by host cells containing a vector encoding thepolypeptide by introducing codons or codon pairs which occur frequentlyin the host organism.

The invention also relates to polynucleotides which have nucleotidechanges which result in amino acid substitutions, additions, deletions,fusions and truncations in the polypeptides of the invention. Suchnucleotide changes may be introduced using techniques such as sitedirected mutagenesis, random chemical mutagenesis, exonuclease HIdeletion and other recombinant DNA techniques. Alternatively, suchnucleotide changes may be naturally occurring allelic variants which areisolated by identifying nucleic acids which specifically hybridize toprobes comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150,200, 300, 400, or 500 consecutive bases of one of the sequences of theinvention (or the sequences complementary thereto) under conditions ofhigh, moderate, or low stringency as provided herein.

General Techniques

The nucleic acids used to practice this invention, whether RNA, siRNA,miRNA, antisense nucleic acid, cDNA, genomic DNA, vectors, viruses orhybrids thereof, may be isolated from a variety of sources, geneticallyengineered, amplified, and/or expressed/generated recombinantly.Recombinant polypeptides (e.g., cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes) generated from thesenucleic acids can be individually isolated or cloned and tested for adesired activity. Any recombinant expression system can be used,including bacterial, mammalian, yeast, insect or plant cell expressionsystems.

Alternatively, these nucleic acids can be synthesized in vitro bywell-known chemical synthesis techniques, as described in, e.g., Adams(1983) J. Am. Chem. Soc. 105:661; Belousov (1997) Nucleic Acids Res.25:3440-3444; Frenkel (1995) Free Radic. Biol. Med. 19:373-380; Blommers(1994) Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 68:90;Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett.22:1859; U.S. Pat. No. 4,458,066.

Techniques for the manipulation of nucleic acids, such as, e.g.,subcloning, labeling probes (e.g., random-primer labeling using Klenowpolymerase, nick translation, amplification), sequencing, hybridizationand the like are well described in the scientific and patent literature,see, e.g., Sambrook, ed., MOLECULAR CLONING: A LABORATORY MANUAL (2NDED.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989); CURRENTPROTOCOLS IN MOLECULAR BIOLOGY, Ausubel, ed. John Wiley & Sons, Inc.,New York (1997); LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULARBIOLOGY: HYBRIDIZATION WITH NUCLEIC ACID PROBES, Part I. Theory andNucleic Acid Preparation, Tijssen, ed. Elsevier, N.Y. (1993).

Another useful means of obtaining and manipulating nucleic acids used topractice the methods of the invention is to clone from genomic samples,and, if desired, screen and re-clone inserts isolated or amplified from,e.g., genomic clones or cDNA clones. Sources of nucleic acid used in themethods of the invention include genomic or cDNA libraries contained in,e.g., mammalian artificial chromosomes (MACS), see, e.g., U.S. Pat. Nos.5,721,118; 6,025,155; human artificial chromosomes, see, e.g., Rosenfeld(1997) Nat. Genet. 15:333-335; yeast artificial chromosomes (YAC);bacterial artificial chromosomes (BAC); P1 artificial chromosomes, see,e.g., Woon (1998) Genomics 50:306-316; P1-derived vectors (PACs), see,e.g., Kern (1997) Biotechniques 23:120-124; cosmids, recombinantviruses, phages or plasmids.

In one aspect, a nucleic acid encoding a polypeptide of the invention isassembled in appropriate phase with a leader sequence capable ofdirecting secretion of the translated polypeptide or fragment thereof.

The invention provides fusion proteins and nucleic acids encoding them.A polypeptide of the invention can be fused to a heterologous peptide orpolypeptide, such as N-terminal identification peptides which impartdesired characteristics, such as increased stability or simplifiedpurification. Peptides and polypeptides of the invention can also besynthesized and expressed as fusion proteins with one or more additionaldomains linked thereto for, e.g., producing a more immunogenic peptide,to more readily isolate a recombinantly synthesized peptide, to identifyand isolate antibodies and antibody-expressing B cells, and the like.Detection and purification facilitating domains include, e.g., metalchelating peptides such as polyhistidine tracts and histidine-tryptophanmodules that allow purification on immobilized metals, protein A domainsthat allow purification on immobilized immunoglobulin, and the domainutilized in the FLAGS extension/affinity purification system (ImmunexCorp, Seattle Wash.). The inclusion of a cleavable linker sequences suchas Factor Xa or enterokinase (Invitrogen, San Diego Calif.) between apurification domain and the motif-comprising peptide or polypeptide tofacilitate purification. For example, an expression vector can includean epitope-encoding nucleic acid sequence linked to six histidineresidues followed by a thioredoxin and an enterokinase cleavage site(see e.g., Williams (1995) Biochemistry 34:1787-1797; Dobeli (1998)Protein Expr. Purif. 12:404-414). The histidine residues facilitatedetection and purification while the enterokinase cleavage site providesa means for purifying the epitope from the remainder of the fusionprotein. Technology pertaining to vectors encoding fusion proteins andapplication of fusion proteins are well described in the scientific andpatent literature, see e.g., Kroll (1993) DNA Cell. Biol., 12:441-53.

Transcriptional and Translational Control Sequences

The invention provides nucleic acid (e.g., DNA) sequences of theinvention operatively linked to expression (e.g., transcriptional ortranslational) control sequence(s), e.g., promoters or enhancers, todirect or modulate RNA synthesis/expression. The expression controlsequence can be in an expression vector. Exemplary bacterial promotersinclude lad, lacZ, T3, T7, gpt, lambda PR, PL and trp. Exemplaryeukaryotic promoters include CMV immediate early, HSV thymidine kinase,early and late SV40, LTRs from retrovirus, and mouse metallothionein I.

As used herein, the term “promoter” includes all sequences capable ofdriving transcription of a coding sequence in a cell, e.g., a plant oranimal cell. Thus, promoters used in the constructs of the inventioninclude cis-acting transcriptional control elements and regulatorysequences that are involved in regulating or modulating the timingand/or rate of transcription of a gene. For example, a promoter can be acis-acting transcriptional control element, including an enhancer, apromoter, a transcription terminator, an origin of replication, achromosomal integration sequence, 5′ and 3′ untranslated regions, or anintronic sequence, which are involved in transcriptional regulation.These cis-acting sequences can interact with proteins or otherbiomolecules to carry out (turn on/off, regulate, modulate, etc.)transcription. “Constitutive” promoters are those that drive expressioncontinuously under most environmental conditions and states ofdevelopment or cell differentiation. “Inducible” or “regulatable”promoters direct expression of the nucleic acid of the invention underthe influence of environmental conditions or developmental conditions.Examples of environmental conditions that may affect transcription byinducible promoters include anaerobic conditions, elevated temperature,drought, or the presence of light.

“Tissue-specific” promoters are transcriptional control elements thatare only active in particular cells or tissues or organs, e.g., inplants or animals. Tissue-specific regulation may be achieved by certainintrinsic factors which ensure that genes encoding proteins specific toa given tissue are expressed. Such factors are known to exist in mammalsand plants so as to allow for specific tissues to develop.

Promoters suitable for expressing a polypeptide in bacteria include theE. coli lac or trp promoters, the lad promoter, the lacZ promoter, theT3 promoter, the T7 promoter, the gpt promoter, the lambda PR promoter,the lambda PL promoter, promoters from operons encoding glycolyticenzymes such as 3-phosphoglycerate kinase (PGK), and the acidphosphatase promoter. Eukaryotic promoters include the CMV immediateearly promoter, the HSV thymidine kinase promoter, heat shock promoters,the early and late SV40 promoter, LTRs from retroviruses, and the mousemetallothionein-I promoter. Other promoters known to control expressionof genes in prokaryotic or eukaryotic cells or their viruses may also beused. Promoters suitable for expressing the polypeptide or fragmentthereof in bacteria include the E. coli lac or trp promoters, the ladpromoter, the lacZ promoter, the T3 promoter, the T7 promoter, the gptpromoter, the lambda P_(R) promoter, the lambda P_(L) promoter,promoters from operons encoding glycolytic enzymes such as3-phosphoglycerate kinase (PGK) and the acid phosphatase promoter.Fungal promoters include the α-factor promoter. Eukaryotic promotersinclude the CMV immediate early promoter, the HSV thymidine kinasepromoter, heat shock promoters, the early and late SV40 promoter, LTRsfrom retroviruses and the mouse metallothionein-I promoter. Otherpromoters known to control expression of genes in prokaryotic oreukaryotic cells or their viruses may also be used.

Tissue-Specific Plant Promoters

The invention provides expression cassettes that can be expressed in atissue-specific manner, e.g., that can express a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention in a tissue-specific manner. The invention also providesplants or seeds that express a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention in atissue-specific manner. The tissue-specificity can be seed specific,stem specific, leaf specific, root specific, fruit specific and thelike.

The term “plant” includes whole plants, plant parts (e.g., leaves,stems, flowers, roots, etc.), plant protoplasts, seeds and plant cellsand progeny of same. The class of plants which can be used in the methodof the invention is generally as broad as the class of higher plantsamenable to transformation techniques, including angiosperms(monocotyledonous and dicotyledonous plants), as well as gymnosperms. Itincludes plants of a variety of ploidy levels, including polyploid,diploid, haploid and hemizygous states. As used herein, the term“transgenic plant” includes plants or plant cells into which aheterologous nucleic acid sequence has been inserted, e.g., the nucleicacids and various recombinant constructs (e.g., expression cassettes) ofthe invention.

In one aspect, a constitutive promoter such as the CaMV 35S promoter canbe used for expression in specific parts of the plant or seed orthroughout the plant. For example, for overexpression, a plant promoterfragment can be employed which will direct expression of a nucleic acidin some or all tissues of a plant, e.g., a regenerated plant. Suchpromoters are referred to herein as “constitutive” promoters and areactive under most environmental conditions and states of development orcell differentiation. Examples of constitutive promoters include thecauliflower mosaic virus (CaMV) 35S transcription initiation region, the1′- or 2′-promoter derived from T-DNA of Agrobacterium tumefaciens, andother transcription initiation regions from various plant genes known tothose of skill. Such genes include, e.g., ACT11 from Arabidopsis (Huang(1996) Plant Mol. Biol. 33:125-139); Cat3 from Arabidopsis (GenBank No.U43147, Zhong (1996) Mol. Gen. Genet. 251:196-203); the gene encodingstearoyl-acyl carrier protein desaturase from Brassica napus (GenbankNo. X74782, Solocombe (1994) Plant Physiol. 104:1167-1176); GPc1 frommaize (GenBank No. X15596; Martinez (1989) J Mol. Biol 209:551-565); theGpc2 from maize (GenBank No. U45855, Manjunath (1997) Plant Mol. Biol.33:97-112); plant promoters described in U.S. Pat. Nos. 4,962,028;5,633,440.

The invention uses tissue-specific or constitutive promoters derivedfrom viruses which can include, e.g., the tobamovirus subgenomicpromoter (Kumagai (1995) Proc. Natl. Acad. Sci. USA 92:1679-1683; therice tungro bacilliform virus (RTBV), which replicates only in phloemcells in infected rice plants, with its promoter which drives strongphloem-specific reporter gene expression; the cassava vein mosaic virus(CVMV) promoter, with highest activity in vascular elements, in leafmesophyll cells, and in root tips (Verdaguer (1996) Plant Mol. Biol.31:1129-1139).

In one aspect, the plant promoter directs expression of cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofurariosidase, and/or oligomerase enzyme-expressingnucleic acid in a specific tissue, organ or cell type (i.e.,tissue-specific promoters) or may be otherwise under more preciseenvironmental or developmental control or under the control of aninducible promoter. Examples of environmental conditions that may affecttranscription include anaerobic conditions, elevated temperature, thepresence of light, or sprayed with chemicals/hormones. For example, theinvention incorporates the drought-inducible promoter of maize (Busk(1997) supra); the cold, drought, and high salt inducible promoter frompotato (Kirch (1997) Plant Mol. Biol. 33:897 909).

In one aspect, tissue-specific promoters promote transcription onlywithin a certain time frame of developmental stage within that tissue.See, e.g., Blazquez (1998) Plant Cell 10:791-800, characterizing theArabidopsis LEAFY gene promoter. See also Cardon (1997) Plant J12:367-77, describing the transcription factor SPL3, which recognizes aconserved sequence motif in the promoter region of the A. thalianafloral meristem identity gene AP1; and Mandel (1995) Plant MolecularBiology, Vol. 29, pp 995-1004, describing the meristem promoter eIF4.Tissue specific promoters which are active throughout the life cycle ofa particular tissue can be used. In one aspect, the nucleic acids of theinvention are operably linked to a promoter active primarily only incotton fiber cells. In one aspect, the nucleic acids of the inventionare operably linked to a promoter active primarily during the stages ofcotton fiber cell elongation, e.g., as described by Rinehart (1996)supra. The nucleic acids can be operably linked to the Fb12A genepromoter to be preferentially expressed in cotton fiber cells (Ibid).See also, John (1997) Proc. Natl. Acad. Sci. USA 89:5769-5773; John, etal., U.S. Pat. Nos. 5,608,148 and 5,602,321, describing cottonfiber-specific promoters and methods for the construction of transgeniccotton plants. Root-specific promoters may also be used to express thenucleic acids of the invention. Examples of root-specific promotersinclude the promoter from the alcohol dehydrogenase gene (DeLisle (1990)Int. Rev. Cytol. 123:39-60). Other promoters that can be used to expressthe nucleic acids of the invention include, e.g., ovule-specific,embryo-specific, endosperm-specific, integument-specific, seedcoat-specific promoters, or some combination thereof; a leaf-specificpromoter (see, e.g., Busk (1997) Plant J. 11:1285 1295, describing aleaf-specific promoter in maize); the ORF13 promoter from Agrobacteriumrhizogenes (which exhibits high activity in roots, see, e.g., Hansen(1997) supra); a maize pollen specific promoter (see, e.g., Guerrero(1990) Mol. Gen. Genet. 224:161 168); a tomato promoter active duringfruit ripening, senescence and abscission of leaves and, to a lesserextent, of flowers can be used (see, e.g., Blume (1997) Plant J. 12:731746); a pistil-specific promoter from the potato SK2 gene (see, e.g.,Ficker (1997) Plant Mol. Biol. 35:425 431); the Blec4 gene from pea,which is active in epidermal tissue of vegetative and floral shootapices of transgenic alfalfa making it a useful tool to target theexpression of foreign genes to the epidermal layer of actively growingshoots or fibers; the ovule-specific BEL1 gene (see, e.g., Reiser (1995)Cell 83:735-742, GenBank No. U39944); and/or, the promoter in Klee, U.S.Pat. No. 5,589,583, describing a plant promoter region is capable ofconferring high levels of transcription in meristematic tissue and/orrapidly dividing cells.

In one aspect, plant promoters which are inducible upon exposure toplant hormones, such as auxins, are used to express the nucleic acids ofthe invention. For example, the invention can use the auxin-responseelements E1 promoter fragment (AuxREs) in the soybean (Glycine max L.)(Liu (1997) Plant Physiol. 115:397-407); the auxin-responsiveArabidopsis GST6 promoter (also responsive to salicylic acid andhydrogen peroxide) (Chen (1996) Plant J. 10: 955-966); theauxin-inducible parC promoter from tobacco (Sakai (1996) 37:906-913); aplant biotin response element (Streit (1997) Mol. Plant MicrobeInteract. 10:933-937); and, the promoter responsive to the stresshormone abscisic acid (Sheen (1996) Science 274:1900-1902).

The nucleic acids of the invention can also be operably linked to plantpromoters which are inducible upon exposure to chemicals reagents whichcan be applied to the plant, such as herbicides or antibiotics. Forexample, the maize In2-2 promoter, activated by benzenesulfonamideherbicide safeners, can be used (De Veylder (1997) Plant Cell Physiol.38:568-577); application of different herbicide safeners inducesdistinct gene expression patterns, including expression in the root,hydathodes, and the shoot apical meristem. Coding sequence can be underthe control of, e.g., a tetracycline-inducible promoter, e.g., asdescribed with transgenic tobacco plants containing the Avena sativa L.(oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473);or, a salicylic acid-responsive element (Stange (1997) Plant J.11:1315-1324). Using chemically- (e.g., hormone- or pesticide-) inducedpromoters, i.e., promoter responsive to a chemical which can be appliedto the transgenic plant in the field, expression of a polypeptide of theinvention can be induced at a particular stage of development of theplant. Thus, the invention also provides for transgenic plantscontaining an inducible gene encoding for polypeptides of the inventionwhose host range is limited to target plant species, such as corn, rice,barley, soybean, tomato, wheat, potato or other crops, inducible at anystage of development of the crop.

One of skill will recognize that a tissue-specific plant promoter maydrive expression of operably linked sequences in tissues other than thetarget tissue. Thus, in one aspect, a tissue-specific promoter is onethat drives expression preferentially in the target tissue or cell type,but may also lead to some expression in other tissues as well.

The nucleic acids of the invention can also be operably linked to plantpromoters which are inducible upon exposure to chemicals reagents. Thesereagents include, e.g., herbicides, synthetic auxins, or antibioticswhich can be applied, e.g., sprayed, onto transgenic plants. Inducibleexpression of the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme-producing nucleic acids of the invention willallow the grower to select plants with the optimal cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme expressionand/or activity. The development of plant parts can thus controlled. Inthis way the invention provides the means to facilitate the harvestingof plants and plant parts. For example, in various embodiments, themaize In2-2 promoter, activated by benzenesulfonamide herbicidesafeners, is used (De Veylder (1997) Plant Cell Physiol. 38:568-577);application of different herbicide safeners induces distinct geneexpression patterns, including expression in the root, hydathodes, andthe shoot apical meristem. Coding sequences of the invention are alsounder the control of a tetracycline-inducible promoter, e.g., asdescribed with transgenic tobacco plants containing the Avena sativa L.(oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473);or, a salicylic acid-responsive element (Stange (1997) Plant J.11:1315-1324).

In some aspects, proper polypeptide expression may requirepolyadenylation region at the 3′-end of the coding region. Thepolyadenylation region can be derived from the natural gene, from avariety of other plant (or animal or other) genes, or from genes in theAgrobacterial T-DNA.

Expression Vectors and Cloning Vehicles

The invention provides expression vectors and cloning vehiclescomprising nucleic acids of the invention, e.g., sequences encoding thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes of the invention. Expression vectors and cloningvehicles of the invention can comprise viral particles, baculovirus,phage, plasmids, phagemids, cosmids, fosmids, bacterial artificialchromosomes, viral DNA (e.g., vaccinia, adenovirus, foul pox virus,pseudorabies and derivatives of SV40), P1-based artificial chromosomes,yeast plasmids, yeast artificial chromosomes, and any other vectorsspecific for specific hosts of interest (such as bacillus, Aspergillusand yeast). Vectors of the invention can include chromosomal,non-chromosomal and synthetic-DNA sequences. Large numbers of suitablevectors are known to those of skill in the art, and are commerciallyavailable. Exemplary vectors are include: bacterial: pQE™ vectors(Qiagen), pBLUESCRIPT™ plasmids, pNH vectors, (lambda-ZAP vectors(Stratagene); ptrc99a, pKK223-3, pDR540, pRIT2T (Pharmacia); Eukaryotic:pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia).However, any other plasmid or other vector may be used so long as theyare replicable and viable in the host. Low copy number or high copynumber vectors may be employed with the present invention. “Plasmids”can be commercially available, publicly available on an unrestrictedbasis, or can be constructed from available plasmids in accord withpublished procedures. Equivalent plasmids to those described herein areknown in the art and will be apparent to the ordinarily skilled artisan.

The expression vector can comprise a promoter, a ribosome binding sitefor translation initiation and a transcription terminator. The vectormay also include appropriate sequences for amplifying expression.Mammalian expression vectors can comprise an origin of replication, anynecessary ribosome binding sites, a polyadenylation site, splice donorand acceptor sites, transcriptional termination sequences, and 5′flanking non-transcribed sequences. In some aspects, DNA sequencesderived from the SV40 splice and polyadenylation sites may be used toprovide the required non-transcribed genetic elements.

In one aspect, the expression vectors contain one or more selectablemarker genes to permit selection of host cells containing the vector.Such selectable markers include genes encoding dihydrofolate reductaseor genes conferring neomycin resistance for eukaryotic cell culture,genes conferring tetracycline or ampicillin resistance in E. coli, andthe S. cerevisiae TRP1 gene. Promoter regions can be selected from anydesired gene using chloramphenicol transferase (CAT) vectors or othervectors with selectable markers.

In one aspect, vectors for expressing the polypeptide or fragmentthereof in eukaryotic cells contain enhancers to increase expressionlevels. Enhancers are cis-acting elements of DNA that can be from about10 to about 300 bp in length. They can act on a promoter to increase itstranscription. Exemplary enhancers include the SV40 enhancer on the lateside of the replication origin by 100 to 270, the cytomegalovirus earlypromoter enhancer, the polyoma enhancer on the late side of thereplication origin, and the adenovirus enhancers.

A nucleic acid sequence can be inserted into a vector by a variety ofprocedures. In general, the sequence is ligated to the desired positionin the vector following digestion of the insert and the vector withappropriate restriction endonucleases. Alternatively, blunt ends in boththe insert and the vector may be ligated. A variety of cloningtechniques are known in the art, e.g., as described in Ausubel andSambrook. Such procedures and others are deemed to be within the scopeof those skilled in the art.

The vector can be in the form of a plasmid, a viral particle, or aphage. Other vectors include chromosomal, non-chromosomal and syntheticDNA sequences, derivatives of SV40; bacterial plasmids, phage DNA,baculovirus, yeast plasmids, vectors derived from combinations ofplasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl poxvirus, and pseudorabies. A variety of cloning and expression vectors foruse with prokaryotic and eukaryotic hosts are described by, e.g.,Sambrook.

Particular bacterial vectors which can be used include the commerciallyavailable plasmids comprising genetic elements of the well known cloningvector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala,Sweden), GEM1 (Promega Biotec, Madison, Wis., USA) pQE70, pQE60, pQE-9(Qiagen), pD10, psiX174 pBLUESCRIPT II KS, pNH8A, pNH16a, pNH18A, pNH46A(Stratagene), ptrc99a, pKK223-3, pKK233-3, DR540, pRIT5 (Pharmacia),pKK232-8 and pCM7. Particular eukaryotic vectors include pSV2CAT, pOG44,pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, and pSVL (Pharmacia). However,any other vector may be used as long as it is replicable and viable inthe host cell.

The nucleic acids of the invention can be expressed in expressioncassettes, vectors or viruses and transiently or stably expressed inplant cells and seeds. One exemplary transient expression system usesepisomal expression systems, e.g., cauliflower mosaic virus (CaMV) viralRNA generated in the nucleus by transcription of an episomalmini-chromosome containing supercoiled DNA, see, e.g., Covey (1990)Proc. Natl. Acad. Sci. USA 87:1633-1637. Alternatively, codingsequences, i.e., all or sub-fragments of sequences of the invention canbe inserted into a plant host cell genome becoming an integral part ofthe host chromosomal DNA. Sense or antisense transcripts can beexpressed in this manner. A vector comprising the sequences (e.g.,promoters or coding regions) from nucleic acids of the invention cancomprise a marker gene that confers a selectable phenotype on a plantcell or a seed. For example, the marker may encode biocide resistance,e.g., antibiotic resistance, such as resistance to kanamycin, G418,bleomycin, hygromycin, or herbicide resistance, such as resistance tochlorosulfuron or Basta.

Expression vectors capable of expressing nucleic acids and proteins inplants are well known in the art, and can include, e.g., vectors fromAgrobacterium spp., potato virus X (see, e.g., Angell (1997) EMBO J.16:3675-3684), tobacco mosaic virus (see, e.g., Casper (1996) Gene173:69-73), tomato bushy stunt virus (see, e.g., Hillman (1989) Virology169:42-50), tobacco etch virus (see, e.g., Dolja (1997) Virology234:243-252), bean golden mosaic virus (see, e.g., Morinaga (1993)Microbiol Immunol. 37:471-476), cauliflower mosaic virus (see, e.g.,Cecchini (1997) Mol. Plant Microbe Interact. 10:1094-1101), maize Ac/Dstransposable element (see, e.g., Rubin (1997) Mol. Cell. Biol.17:6294-6302; Kunze (1996) Curr. Top. Microbiol. Immunol. 204:161-194),and the maize suppressor-mutator (Spm) transposable element (see, e.g.,Schlappi (1996) Plant Mol. Biol. 32:717-725); and derivatives thereof.

In one aspect, the expression vector can have two replication systems toallow it to be maintained in two organisms, for example in mammalian orinsect cells for expression and in a prokaryotic host for cloning andamplification. Furthermore, for integrating expression vectors, theexpression vector can contain at least one sequence homologous to thehost cell genome. It can contain two homologous sequences which flankthe expression construct. The integrating vector can be directed to aspecific locus in the host cell by selecting the appropriate homologoussequence for inclusion in the vector. Constructs for integrating vectorsare well known in the art.

Expression vectors of the invention may also include a selectable markergene to allow for the selection of bacterial strains that have beentransformed, e.g., genes which render the bacteria resistant to drugssuch as ampicillin, chloramphenicol, erythromycin, kanamycin, neomycinand tetracycline. Selectable markers can also include biosyntheticgenes, such as those in the histidine, tryptophan and leucinebiosynthetic pathways.

The DNA sequence in the expression vector is operatively linked to anappropriate expression control sequence(s) (promoter) to direct RNAsynthesis. Particular named bacterial promoters include lacI; lacZ, T3,T7, gpt, lambda P_(R), P_(L) and trp. Eukaryotic promoters include CMVimmediate early, HSV thymidine kinase, early and late SV40, LTRs fromretrovirus and mouse metallothionein-I. Selection of the appropriatevector and promoter is well within the level of ordinary skill in theart. The expression vector also contains a ribosome binding site fortranslation initiation and a transcription terminator. The vector mayalso include appropriate sequences for amplifying expression. Promoterregions can be selected from any desired gene using chloramphenicoltransferase (CAT) vectors or other vectors with selectable markers. Inaddition, the expression vectors in one aspect contain one or moreselectable marker genes to provide a phenotypic trait for selection oftransformed host cells such as dihydrofolate reductase or neomycinresistance for eukaryotic cell culture, or such as tetracycline orampicillin resistance in E. coli.

Mammalian expression vectors may also comprise an origin of replication,any necessary ribosome binding sites, a polyadenylation site, splicedonor and acceptor sites, transcriptional termination sequences and 5′flanking nontranscribed sequences. In some aspects, DNA sequencesderived from the SV40 splice and polyadenylation sites may be used toprovide the required nontranscribed genetic elements.

Vectors for expressing the polypeptide or fragment thereof in eukaryoticcells may also contain enhancers to increase expression levels.Enhancers are cis-acting elements of DNA, usually from about 10 to about300 bp in length that act on a promoter to increase its transcription.Examples include the SV40 enhancer on the late side of the replicationorigin by 100 to 270, the cytomegalovirus early promoter enhancer, thepolyoma enhancer on the late side of the replication origin and theadenovirus enhancers.

In addition, the expression vectors can contain one or more selectablemarker genes to permit selection of host cells containing the vector.Such selectable markers include genes encoding dihydrofolate reductaseor genes conferring neomycin resistance for eukaryotic cell culture,genes conferring tetracycline or ampicillin resistance in E. coli andthe S. cerevisiae TRP1 gene.

In some aspects, the nucleic acid encoding one of the polypeptides ofthe invention, or fragments comprising at least about 5, 10, 15, 20, 25,30, 35, 40, 50, 75, 100, or 150 or more consecutive amino acids thereofis assembled in appropriate phase with a leader sequence capable ofdirecting secretion of the translated polypeptide or fragment thereof.In one aspect, the nucleic acid can encode a fusion polypeptide in whichone of the polypeptides of the invention, or fragments comprising atleast 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or moreconsecutive amino acids thereof is fused to heterologous peptides orpolypeptides, such as N-terminal identification peptides which impartdesired characteristics, such as increased stability or simplifiedpurification.

The appropriate DNA sequence may be inserted into the vector by avariety of procedures. In general, the DNA sequence is ligated to thedesired position in the vector following digestion of the insert and thevector with appropriate restriction endonucleases. Alternatively, bluntends in both the insert and the vector may be ligated. A variety ofcloning techniques are disclosed in Ausubel et al. Current Protocols inMolecular Biology, John Wiley 503 Sons, Inc. 1997 and Sambrook et al.,Molecular Cloning: A Laboratory Manual 2nd Ed., Cold Spring HarborLaboratory Press (1989. Such procedures and others are deemed to bewithin the scope of those skilled in the art.

The vector may be, for example, in the form of a plasmid, a viralparticle, or a phage. Other vectors include chromosomal, nonchromosomaland synthetic DNA sequences, derivatives of SV40; bacterial plasmids,phage DNA, baculovirus, yeast plasmids, vectors derived fromcombinations of plasmids and phage DNA, viral DNA such as vaccinia,adenovirus, fowl pox virus and pseudorabies. A variety of cloning andexpression vectors for use with prokaryotic and eukaryotic hosts aredescribed by Sambrook, et al., Molecular Cloning: A Laboratory Manual,2nd Ed., Cold Spring Harbor, N.Y., (1989).

Host Cells and Transformed Cells

The invention also provides a transformed cell comprising a nucleic acidsequence of the invention, e.g., a sequence encoding a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention, or a vector of the invention. The host cell may be any of thehost cells familiar to those skilled in the art, including prokaryoticcells, eukaryotic cells, such as bacterial cells, fungal cells, yeastcells, mammalian cells, insect cells, or plant cells. Exemplarybacterial cells include any species of Streptomyces, Pseudomonas,Staphylococcus or Bacillus, or the exemplary species E. coli, Bacillussubtilis, Bacillus cereus, Salmonella typhimurium. Exemplary insectcells include any species of Spodoptera or Drosophila, includingDrosophila S2 and Spodoptera Sf9. Exemplary animal cells include CHO,COS or Bowes melanoma or any mouse or human cell line. The selection ofan appropriate host is within the abilities of those skilled in the art.Techniques for transforming a wide variety of higher plant species arewell known and described in the technical and scientific literature.See, e.g., Weising (1988) Ann. Rev. Genet. 22:421-477; U.S. Pat. No.5,750,870.

The vector can be introduced into the host cells using any of a varietyof techniques, including transformation, transfection, transduction,viral infection, gene guns, or Ti-mediated gene transfer. Particularmethods include calcium phosphate transfection, DEAE-Dextran mediatedtransfection, lipofection, or electroporation (Davis, L., Dibner, M.,Battey, I., Basic Methods in Molecular Biology, (1986)).

In one aspect, the nucleic acids or vectors of the invention areintroduced into the cells for screening, thus, the nucleic acids enterthe cells in a manner suitable for subsequent expression of the nucleicacid. The method of introduction is largely dictated by the targetedcell type. Exemplary methods include CaPO₄ precipitation, liposomefusion, lipofection (e.g., LIPOFECTIN™), electroporation, viralinfection, etc. The candidate nucleic acids may stably integrate intothe genome of the host cell (for example, with retroviral introduction)or may exist either transiently or stably in the cytoplasm (i.e.,through the use of traditional plasmids, utilizing standard regulatorysequences, selection markers, etc.). As many pharmaceutically importantscreens require human or model mammalian cell targets, retroviralvectors capable of transfecting such targets can be used.

Where appropriate, the engineered host cells can be cultured inconventional nutrient media modified as appropriate for activatingpromoters, selecting transformants or amplifying the genes of theinvention. Following transformation of a suitable host strain and growthof the host strain to an appropriate cell density, the selected promotermay be induced by appropriate means (e.g., temperature shift or chemicalinduction) and the cells may be cultured for an additional period toallow them to produce the desired polypeptide or fragment thereof.

Cells can be harvested by centrifugation, disrupted by physical orchemical means, and the resulting crude extract is retained for furtherpurification. Microbial cells employed for expression of proteins can bedisrupted by any convenient method, including freeze-thaw cycling,sonication, mechanical disruption, or use of cell lysing agents. Suchmethods are well known to those skilled in the art. The expressedpolypeptide or fragment thereof can be recovered and purified fromrecombinant cell cultures by methods including ammonium sulfate orethanol precipitation, acid extraction, anion or cation exchangechromatography, phosphocellulose chromatography, hydrophobic interactionchromatography, affinity chromatography, hydroxylapatite chromatographyand lectin chromatography. Protein refolding steps can be used, asnecessary, in completing configuration of the polypeptide. If desired,high performance liquid chromatography (HPLC) can be employed for finalpurification steps.

The constructs in host cells can be used in a conventional manner toproduce the gene product encoded by the recombinant sequence. Dependingupon the host employed in a recombinant production procedure, thepolypeptides produced by host cells containing the vector may beglycosylated or may be non-glycosylated. Polypeptides of the inventionmay or may not also include an initial methionine amino acid residue.

Cell-free translation systems can also be employed to produce apolypeptide of the invention. Cell-free translation systems can usemRNAs transcribed from a DNA construct comprising a promoter operablylinked to a nucleic acid encoding the polypeptide or fragment thereof.In some aspects, the DNA construct may be linearized prior to conductingan in vitro transcription reaction. The transcribed mRNA is thenincubated with an appropriate cell-free translation extract, such as arabbit reticulocyte extract, to produce the desired polypeptide orfragment thereof.

The expression vectors can contain one or more selectable marker genesto provide a phenotypic trait for selection of transformed host cellssuch as dihydrofolate reductase or neomycin resistance for eukaryoticcell culture, or such as tetracycline or ampicillin resistance in E.coli.

Host cells containing the polynucleotides of interest, e.g., nucleicacids of the invention, can be cultured in conventional nutrient mediamodified as appropriate for activating promoters, selectingtransformants or amplifying genes. The culture conditions, such astemperature, pH and the like, are those previously used with the hostcell selected for expression and will be apparent to the ordinarilyskilled artisan. The clones which are identified as having the specifiedenzyme activity may then be sequenced to identify the polynucleotidesequence encoding an enzyme having the enhanced activity.

The invention provides a method for overexpressing a recombinantcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme in a cell comprising expressing a vector comprising anucleic acid of the invention, e.g., a nucleic acid comprising a nucleicacid sequence with at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%,57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%,71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%,85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,99%, or more sequence identity to an exemplary sequence of the inventionover a region of at least about 100 residues, wherein the sequenceidentities are determined by analysis with a sequence comparisonalgorithm or by visual inspection, or, a nucleic acid that hybridizesunder stringent conditions to a nucleic acid sequence of the invention.The overexpression can be effected by any means, e.g., use of a highactivity promoter, a dicistronic vector or by gene amplification of thevector.

The nucleic acids of the invention can be expressed, or overexpressed,in any in vitro or in vivo expression system. Any cell culture systemscan be employed to express, or over-express, recombinant protein,including bacterial, insect, yeast, fungal or mammalian cultures.Over-expression can be effected by appropriate choice of promoters,enhancers, vectors (e.g., use of replicon vectors, dicistronic vectors(see, e.g., Gurtu (1996) Biochem. Biophys. Res. Commun. 229:295-8),media, culture systems and the like. In one aspect, gene amplificationusing selection markers, e.g., glutamine synthetase (see, e.g., Sanders(1987) Dev. Biol. Stand. 66:55-63), in cell systems are used tooverexpress the polypeptides of the invention. The host cell may be anyof the host cells familiar to those skilled in the art, includingprokaryotic cells, eukaryotic cells, mammalian cells, insect cells, orplant cells. The selection of an appropriate host is within theabilities of those skilled in the art.

The vector may be introduced into the host cells using any of a varietyof techniques, including transformation, transfection, transduction,viral infection, gene guns, or Ti-mediated gene transfer. Particularmethods include calcium phosphate transfection, DEAE-Dextran mediatedtransfection, lipofection, or electroporation (Davis, L., Dibner, M.,Battey, I., Basic Methods in Molecular Biology, (1986)).

Where appropriate, the engineered host cells can be cultured inconventional nutrient media modified as appropriate for activatingpromoters, selecting transformants or amplifying the genes of theinvention. Following transformation of a suitable host strain and growthof the host strain to an appropriate cell density, the selected promotermay be induced by appropriate means (e.g., temperature shift or chemicalinduction) and the cells may be cultured for an additional period toallow them to produce the desired polypeptide or fragment thereof.

Cells can be harvested by centrifugation, disrupted by physical orchemical means and the resulting crude extract is retained for furtherpurification. Microbial cells employed for expression of proteins can bedisrupted by any convenient method, including freeze-thaw cycling,sonication, mechanical disruption, or use of cell lysing agents. Suchmethods are well known to those skilled in the art. The expressedpolypeptide or fragment thereof can be recovered and purified fromrecombinant cell cultures by methods including ammonium sulfate orethanol precipitation, acid extraction, anion or cation exchangechromatography, phosphocellulose chromatography, hydrophobic interactionchromatography, affinity chromatography, hydroxylapatite chromatographyand lectin chromatography. Protein refolding steps can be used, asnecessary, in completing configuration of the polypeptide. If desired,high performance liquid chromatography (HPLC) can be employed for finalpurification steps.

Various mammalian cell culture systems can also be employed to expressrecombinant protein. Examples of mammalian expression systems includethe COS-7 lines of monkey kidney fibroblasts (described by Gluzman,Cell, 23:175, 1981) and other cell lines capable of expressing proteinsfrom a compatible vector, such as the C127, 3T3, CHO, HeLa and BHK celllines.

The constructs in host cells can be used in a conventional manner toproduce the gene product encoded by the recombinant sequence. Dependingupon the host employed in a recombinant production procedure, thepolypeptides produced by host cells containing the vector may beglycosylated or may be non-glycosylated. Polypeptides of the inventionmay or may not also include an initial methionine amino acid residue.

Alternatively, the polypeptides of the invention, or fragmentscomprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150or more consecutive amino acids thereof can be synthetically produced byconventional peptide synthesizers, e.g., as discussed below. In otheraspects, fragments or portions of the polypeptides may be employed forproducing the corresponding full-length polypeptide by peptidesynthesis; therefore, the fragments may be employed as intermediates forproducing the full-length polypeptides.

Cell-free translation systems can also be employed to produce one of thepolypeptides of the invention, or fragments comprising at least 5, 10,15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more consecutive aminoacids thereof using mRNAs transcribed from a DNA construct comprising apromoter operably linked to a nucleic acid encoding the polypeptide orfragment thereof. In some aspects, the DNA construct may be linearizedprior to conducting an in vitro transcription reaction. The transcribedmRNA is then incubated with an appropriate cell-free translationextract, such as a rabbit reticulocyte extract, to produce the desiredpolypeptide or fragment thereof.

Amplification of Nucleic Acids

In practicing the invention, nucleic acids of the invention and nucleicacids encoding the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes of the invention, or modified nucleic acidsof the invention, can be reproduced by amplification, e.g., PCR.Amplification can also be used to clone or modify the nucleic acids ofthe invention. Thus, the invention provides amplification primersequence pairs for amplifying nucleic acids of the invention. One ofskill in the art can design amplification primer sequence pairs for anypart of or the full length of these sequences.

In one aspect, the invention provides a nucleic acid amplified by anamplification primer pair of the invention, e.g., a primer pair as setforth by about the first (the 5′) 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, or 25 or more residues of a nucleic acid of theinvention, and about the first (the 5′) 15, 16, 17, 18, 19, 20, 21, 22,23, 24, or 25 or more residues of the complementary strand. Theinvention provides amplification primer sequence pairs for amplifying anucleic acid encoding a polypeptide having a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activity,wherein the primer pair is capable of amplifying a nucleic acidcomprising a sequence of the invention, or fragments or subsequencesthereof. One or each member of the amplification primer sequence paircan comprise an oligonucleotide comprising at least about 10 to 50 ormore consecutive bases of the sequence, or about 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, or 25 or more consecutive bases of thesequence. The invention provides amplification primer pairs, wherein theprimer pair comprises a first member having a sequence as set forth byabout the first (the 5′) 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, or 25 or more residues of a nucleic acid of the invention, and asecond member having a sequence as set forth by about the first (the 5′)12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 or moreresidues of the complementary strand of the first member.

The invention provides cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes generated byamplification, e.g., polymerase chain reaction (PCR), using anamplification primer pair of the invention. The invention providesmethods of making a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme by amplification, e.g., PCR, using anamplification primer pair of the invention. In one aspect, theamplification primer pair amplifies a nucleic acid from a library, e.g.,a gene library, such as an environmental library.

Amplification reactions can also be used to quantify the amount ofnucleic acid in a sample (such as the amount of message in a cellsample), label the nucleic acid (e.g., to apply it to an array or ablot), detect the nucleic acid, or quantify the amount of a specificnucleic acid in a sample. In one aspect of the invention, messageisolated from a cell or a cDNA library are amplified.

The skilled artisan can select and design suitable oligonucleotideamplification primers. Amplification methods are also well known in theart, and include, e.g., polymerase chain reaction, PCR (see, e.g., PCRPROTOCOLS, A GUIDE TO METHODS AND APPLICATIONS, ed. Innis, AcademicPress, N.Y. (1990) and PCR STRATEGIES (1995), ed. Innis, Academic Press,Inc., N.Y., ligase chain reaction (LCR) (see, e.g., Wu (1989) Genomics4:560; Landegren (1988) Science 241:1077; Barringer (1990) Gene 89:117);transcription amplification (see, e.g., Kwoh (1989) Proc. Natl. Acad.Sci. USA 86:1173); and, self-sustained sequence replication (see, e.g.,Guatelli (1990) Proc. Natl. Acad. Sci. USA 87:1874); Q Beta replicaseamplification (see, e.g., Smith (1997) J. Clin. Microbiol.35:1477-1491), automated Q-beta replicase amplification assay (see,e.g., Burg (1996) Mol. Cell. Probes 10:257-271) and other RNA polymerasemediated techniques (e.g., NASBA, Cangene, Mississauga, Ontario); seealso Berger (1987) Methods Enzymol. 152:307-316; Sambrook; Ausubel; U.S.Pat. Nos. 4,683,195 and 4,683,202; Sooknanan (1995) Biotechnology13:563-564.

Determining Sequence Identity in Nucleic Acids and Polypeptides

The invention provides nucleic acids comprising sequences having atleast about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%,62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%,76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%,90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete(100%) sequence identity (homology) to an exemplary nucleic acid of theinvention (see also Tables 1, 2, and 3, Examples 1 and 4, below, andSequence Listing) over a region of at least about 50, 75, 100, 150, 200,250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900,950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500,1550 or more, residues. The invention provides polypeptides comprisingsequences having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%,58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%,72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%,86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, ormore, or complete (100%) sequence identity to an exemplary polypeptideof the invention (see Tables 1, 2, and 3, Examples 1 and 4, below, andSequence Listing). The extent of sequence identity (homology) may bedetermined using any computer program and associated parameters,including those described herein, such as BLAST 2.2.2. or FASTA version3.0t78, with the default parameters.

Nucleic acid sequences of the invention can comprise at least 10, 15,20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or moreconsecutive nucleotides of an exemplary sequence of the invention andsequences substantially identical thereto. Homologous sequences andfragments of nucleic acid sequences of the invention can refer to asequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%,58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%,72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%,86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, ormore sequence identity (homology) to these sequences. Homology (sequenceidentity) may be determined using any of the computer programs andparameters described herein, including FASTA version 3.0t78 with thedefault parameters. Homologous sequences also include RNA sequences inwhich uridines replace the thymines in the nucleic acid sequences of theinvention. The homologous sequences may be obtained using any of theprocedures described herein or may result from the correction of asequencing error. It will be appreciated that the nucleic acid sequencesof the invention can be represented in the traditional single characterformat (See the inside back cover of Stryer, Lubert. Biochemistry, 3rdEd., W. H Freeman & Co., New York.) or in any other format which recordsthe identity of the nucleotides in a sequence.

In various aspects, sequence comparison programs identified herein areused in this aspect of the invention, i.e., to determine if a nucleicacid or polypeptide sequence is within the scope of the invention.However, protein and/or nucleic acid sequence identities (homologies)may be evaluated using any sequence comparison algorithm or programknown in the art. Such algorithms and programs include, but are by nomeans limited to, TBLASTN, BLASTP, FASTA, TFASTA and CLUSTALW (see,e.g., Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85(8):2444-2448,1988; Altschul et al., J. Mol. Biol. 215(3):403-410, 1990; ThompsonNucleic Acids Res. 22(2):4673-4680, 1994; Higgins et al., MethodsEnzymol. 266:383-402, 1996; Altschul et al., J. Mol. Biol.215(3):403-410, 1990; Altschul et al., Nature Genetics 3:266-272, 1993).

In one aspect, homology or identity is measured using sequence analysissoftware (e.g., Sequence Analysis Software Package of the GeneticsComputer Group, University of Wisconsin Biotechnology Center, 1710University Avenue, Madison, Wis. 53705). Such software matches similarsequences by assigning degrees of homology to various deletions,substitutions and other modifications. In one aspect, the terms“homology” and “identity” in the context of two or more nucleic acids orpolypeptide sequences, refer to two or more sequences or subsequencesthat are the same or have a specified percentage of amino acid residuesor nucleotides that are the same when compared and aligned for maximumcorrespondence over a comparison window or designated region as measuredusing any number of sequence comparison algorithms or by manualalignment and visual inspection. In one aspect, for sequence comparison,one sequence acts as a reference sequence, to which test sequences arecompared. When using a sequence comparison algorithm, test and referencesequences are entered into a computer, subsequence coordinates aredesignated, if necessary and sequence algorithm program parameters aredesignated. Default program parameters can be used, or alternativeparameters can be designated. The sequence comparison algorithm thencalculates the percent sequence identities for the test sequencesrelative to the reference sequence, based on the program parameters.

A “comparison window”, as used herein, includes reference to a segmentof any one of the number of contiguous positions selected from the groupconsisting of from 20 to 600, usually about 50 to about 200, moreusually about 100 to about 150 in which a sequence may be compared to areference sequence of the same number of contiguous positions after thetwo sequences are optimally aligned. Methods of alignment of sequencefor comparison are well-known in the art. Optimal alignment of sequencesfor comparison can be conducted, e.g., by the local homology algorithmof Smith & Waterman, Adv. Appl. Math. 2:482, 1981, by the homologyalignment algorithm of Needleman & Wunsch, J. Mol. Biol 48:443, 1970, bythe search for similarity method of person & Lipman, Proc. Nat'l. Acad.Sci. USA 85:2444, 1988, by computerized implementations of thesealgorithms (GAP, BESTFIT, FASTA and TFASTA in the Wisconsin GeneticsSoftware Package, Genetics Computer Group, 575 Science Dr., Madison,Wis.), or by manual alignment and visual inspection. Other algorithmsfor determining homology or identity include, for example, in additionto a BLAST program (Basic Local Alignment Search Tool at the NationalCenter for Biological Information), ALIGN, AMAS (Analysis of MultiplyAligned Sequences), AMPS (Protein Multiple Sequence Alignment), ASSET(Aligned Segment Statistical Evaluation Tool), BANDS, BESTSCOR, BIOSCAN(Biological Sequence Comparative Analysis Node), BLIMPS (BLocks IMProvedSearcher), FASTA, Intervals & Points, BMB, CLUSTAL V, CLUSTAL W,CONSENSUS, LCONSENSUS, WCONSENSUS, Smith-Waterman algorithm, DARWIN, LasVegas algorithm, FNAT (Forced Nucleotide Alignment Tool), Framealign,Framesearch, DYNAMIC, FILTER, FSAP (Fristensky Sequence AnalysisPackage), GAP (Global Alignment Program), GENAL, GIBBS, GenQuest, ISSC(Sensitive Sequence Comparison), LALIGN (Local Sequence Alignment), LCP(Local Content Program), MACAW (Multiple Alignment Construction &Analysis Workbench), MAP (Multiple Alignment Program), MBLKP, MBLKN,PIMA (Pattern-Induced Multi-sequence Alignment), SAGA (SequenceAlignment by Genetic Algorithm) and WHAT-IF. Such alignment programs canalso be used to screen genome databases to identify polynucleotidesequences having substantially identical sequences. A number of genomedatabases are available, for example, a substantial portion of the humangenome is available as part of the Human Genome Sequencing Project(Gibbs, 1995). At least twenty-one other genomes have already beensequenced, including, for example, M. genitalium (Fraser et al., 1995),M. jannaschii (Bult et al., 1996), H. influenzae (Fleischmann et al.,1995), E. coli (Blattner et al., 1997) and yeast (S. cerevisiae) (Meweset al., 1997) and D. melanogaster (Adams et al., 2000). Significantprogress has also been made in sequencing the genomes of model organism,such as mouse, C. elegans and Arabadopsis sp. Several databasescontaining genomic information annotated with some functionalinformation are maintained by different organizations and may beaccessible via the internet.

In one aspect, BLAST and BLAST 2.0 algorithms are used, which aredescribed in Altschul et al., Nuc. Acids Res. 25:3389-3402, 1977 andAltschul et al., J. Mol. Biol. 215:403-410, 1990, respectively. Softwarefor performing BLAST analyses is publicly available through the NationalCenter for Biotechnology Information. This algorithm involves firstidentifying high scoring sequence pairs (HSPs) by identifying shortwords of length W in the query sequence, which either match or satisfysome positive-valued threshold score T when aligned with a word of thesame length in a database sequence. T is referred to as the neighborhoodword score threshold (Altschul et al., supra). These initialneighborhood word hits act as seeds for initiating searches to findlonger HSPs containing them. The word hits are extended in bothdirections along each sequence for as far as the cumulative alignmentscore can be increased. Cumulative scores are calculated using, fornucleotide sequences, the parameters M (reward score for a pair ofmatching residues; always >0). For amino acid sequences, a scoringmatrix is used to calculate the cumulative score. Extension of the wordhits in each direction are halted when: the cumulative alignment scorefalls off by the quantity X from its maximum achieved value; thecumulative score goes to zero or below, due to the accumulation of oneor more negative-scoring residue alignments; or the end of eithersequence is reached. The BLAST algorithm parameters W, T and X determinethe sensitivity and speed of the alignment. The BLASTN program (fornucleotide sequences) uses as defaults a wordlength (W) of 11, anexpectation (E) of 10; M=5, N=−4 and a comparison of both strands. Foramino acid sequences, the BLASTP program uses as defaults a wordlengthof 3 and expectations (E) of 10 and the BLOSUM62 scoring matrix (seeHenikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915, 1989)alignments (B) of 50, expectation (E) of 10, M=5, N=−4 and a comparisonof both strands.

The BLAST algorithm also performs a statistical analysis of thesimilarity between two sequences (see, e.g., Karlin & Altschul, Proc.Natl. Acad. Sci. USA 90:5873, 1993). One measure of similarity providedby BLAST algorithm is the smallest sum probability (P(N)), whichprovides an indication of the probability by which a match between twonucleotide or amino acid sequences would occur by chance. For example, anucleic acid is considered similar to a references sequence if thesmallest sum probability in a comparison of the test nucleic acid to thereference nucleic acid is less than about 0.2, more in one aspect lessthan about 0.01 and most in one aspect less than about 0.001.

In one aspect, protein and nucleic acid sequence homologies areevaluated using the Basic Local Alignment Search Tool (“BLAST”) Inparticular, five specific BLAST programs are used to perform thefollowing task:

-   -   (1) BLASTP and BLAST3 compare an amino acid query sequence        against a protein sequence database;    -   (2) BLASTN compares a nucleotide query sequence against a        nucleotide sequence database;    -   (3) BLASTX compares the six-frame conceptual translation        products of a query nucleotide sequence (both strands) against a        protein sequence database;    -   (4) TBLASTN compares a query protein sequence against a        nucleotide sequence database translated in all six reading        frames (both strands); and    -   (5) TBLASTX compares the six-frame translations of a nucleotide        query sequence against the six-frame translations of a        nucleotide sequence database.

The BLAST programs identify homologous sequences by identifying similarsegments, which are referred to herein as “high-scoring segment pairs,”between a query amino or nucleic acid sequence and a test sequence whichis in one aspect obtained from a protein or nucleic acid sequencedatabase. High-scoring segment pairs are in one aspect identified (i.e.,aligned) by means of a scoring matrix, many of which are known in theart. In one aspect, the scoring matrix used is the BLOSUM62 matrix(Gonnet (1992) Science 256:1443-1445; Henikoff and Henikoff (1993)Proteins 17:49-61). Less in one aspect, the PAM or PAM250 matrices mayalso be used (see, e.g., Schwartz and Dayhoff, eds., 1978, Matrices forDetecting Distance Relationships: Atlas of Protein Sequence andStructure, Washington: National Biomedical Research Foundation). BLASTprograms are accessible through the U.S. National Library of Medicine.

The parameters used with the above algorithms may be adapted dependingon the sequence length and degree of homology studied. In some aspects,the parameters may be the default parameters used by the algorithms inthe absence of instructions from the user.

Computer Systems and Computer Program Products

The invention provides computers, computer systems, computer readablemediums, computer programs products and the like recorded or storedthereon the nucleic acid and polypeptide sequences of the invention.Additionally, in practicing the methods of the invention, e.g., todetermine and identify sequence identities (to determine whether anucleic acid is within the scope of the invention), structuralhomologies, motifs and the like in silico, a nucleic acid or polypeptidesequence of the invention can be stored, recorded, and manipulated onany medium which can be read and accessed by a computer.

As used herein, the words “recorded” and “stored” refer to a process forstoring information on a computer medium. A skilled artisan can readilyadopt any known methods for recording information on a computer readablemedium to generate manufactures comprising one or more of the nucleicacid and/or polypeptide sequences of the invention. As used herein, theterms “computer,” “computer program” and “processor” are used in theirbroadest general contexts and incorporate all such devices, as describedin detail, below. A “coding sequence of” or a “sequence encodes” aparticular polypeptide or protein, is a nucleic acid sequence which istranscribed and translated into a polypeptide or protein when placedunder the control of appropriate regulatory sequences.

The polypeptides of the invention include exemplary sequences of theinvention and sequences substantially identical thereto; andsubsequences (fragments) of any of the preceding sequences. In oneaspect, substantially identical, or homologous, polypeptide sequencesrefer to a polypeptide sequence having at least 50%, 51%, 52%, 53%, 54%,55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%,69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99%, or more, or complete (100%) sequence identity (homology)to an exemplary sequence of the invention.

Homology (sequence identity) may be determined using any of the computerprograms and parameters described herein. A nucleic acid or polypeptidesequence of the invention can be stored, recorded and manipulated on anymedium which can be read and accessed by a computer. As used herein, thewords “recorded” and “stored” refer to a process for storing informationon a computer medium. A skilled artisan can readily adopt any of thepresently known methods for recording information on a computer readablemedium to generate manufactures comprising one or more of the nucleicacid sequences of the invention, one or more of the polypeptidesequences of the invention. Another aspect of the invention is acomputer readable medium having recorded thereon at least 2, 5, 10, 15,or 20 or more nucleic acid or polypeptide sequences of the invention.

Another aspect of the invention is a computer readable medium havingrecorded thereon one or more of the nucleic acid sequences of theinvention. Another aspect of the invention is a computer readable mediumhaving recorded thereon one or more of the polypeptide sequences of theinvention. Another aspect of the invention is a computer readable mediumhaving recorded thereon at least 2, 5, 10, 15, or 20 or more of thenucleic acid or polypeptide sequences as set forth above.

Computer readable media include magnetically readable media, opticallyreadable media, electronically readable media and magnetic/opticalmedia. For example, the computer readable media may be a hard disk, afloppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD),Random Access Memory (RAM), or Read Only Memory (ROM) as well as othertypes of other media known to those skilled in the art.

Aspects of the invention include systems (e.g., internet based systems),e.g., computer systems which store and manipulate the sequenceinformation described herein. One example of a computer system 100 isillustrated in block diagram form in FIG. 1. As used herein, “a computersystem” refers to the hardware components, software components and datastorage components used to analyze a nucleotide sequence of a nucleicacid sequence of the invention, or a polypeptide sequence of theinvention. In one aspect, the computer system 100 includes a processorfor processing, accessing and manipulating the sequence data. Theprocessor 105 can be any well-known type of central processing unit,such as, for example, the Pentium III from Intel Corporation, or similarprocessor from Sun, Motorola, Compaq, AMD or International BusinessMachines.

In one aspect, the computer system 100 is a general purpose system thatcomprises the processor 105 and one or more internal data storagecomponents 110 for storing data and one or more data retrieving devicesfor retrieving the data stored on the data storage components. A skilledartisan can readily appreciate that any one of the currently availablecomputer systems are suitable.

In one particular aspect, the computer system 100 includes a processor105 connected to a bus which is connected to a main memory 115 (in oneaspect implemented as RAM) and one or more internal data storage devices110, such as a hard drive and/or other computer readable media havingdata recorded thereon. In some aspects, the computer system 100 furtherincludes one or more data retrieving device 118 for reading the datastored on the internal data storage devices 110.

The data retrieving device 118 may represent, for example, a floppy diskdrive, a compact disk drive, a magnetic tape drive, or a modem capableof connection to a remote data storage system (e.g., via the internet)etc. In some aspects, the internal data storage device 110 is aremovable computer readable medium such as a floppy disk, a compactdisk, a magnetic tape, etc. containing control logic and/or datarecorded thereon. The computer system 100 may advantageously include orbe programmed by appropriate software for reading the control logicand/or the data from the data storage component once inserted in thedata retrieving device.

The computer system 100 includes a display 120 which is used to displayoutput to a computer user. It should also be noted that the computersystem 100 can be linked to other computer systems 125 a-c in a networkor wide area network to provide centralized access to the computersystem 100.

Software for accessing and processing the nucleotide sequences of anucleic acid sequence of the invention, or a polypeptide sequence of theinvention, (such as search tools, compare tools and modeling tools etc.)may reside in main memory 115 during execution.

In some aspects, the computer system 100 may further comprise a sequencecomparison algorithm for comparing a nucleic acid sequence of theinvention, or a polypeptide sequence of the invention, stored on acomputer readable medium to a reference nucleotide or polypeptidesequence(s) stored on a computer readable medium. A “sequence comparisonalgorithm” refers to one or more programs which are implemented (locallyor remotely) on the computer system 100 to compare a nucleotide sequencewith other nucleotide sequences and/or compounds stored within a datastorage means. For example, the sequence comparison algorithm maycompare the nucleotide sequences of a nucleic acid sequence of theinvention, or a polypeptide sequence of the invention, stored on acomputer readable medium to reference sequences stored on a computerreadable medium to identify homologies or structural motifs.

FIG. 2 is a flow diagram illustrating one aspect of a process 200 forcomparing a new nucleotide or protein sequence with a database ofsequences in order to determine the homology levels between the newsequence and the sequences in the database. The database of sequencescan be a private database stored within the computer system 100, or apublic database such as GENBANK that is available through the Internet.

The process 200 begins at a start state 201 and then moves to a state202 wherein the new sequence to be compared is stored to a memory in acomputer system 100. As discussed above, the memory could be any type ofmemory, including RAM or an internal storage device.

The process 200 then moves to a state 204 wherein a database ofsequences is opened for analysis and comparison. The process 200 thenmoves to a state 206 wherein the first sequence stored in the databaseis read into a memory on the computer. A comparison is then performed ata state 210 to determine if the first sequence is the same as the secondsequence. It is important to note that this step is not limited toperforming an exact comparison between the new sequence and the firstsequence in the database. Well-known methods are known to those of skillin the art for comparing two nucleotide or protein sequences, even ifthey are not identical. For example, gaps can be introduced into onesequence in order to raise the homology level between the two testedsequences. The parameters that control whether gaps or other featuresare introduced into a sequence during comparison are normally entered bythe user of the computer system.

Once a comparison of the two sequences has been performed at the state210, a determination is made at a decision state 210 whether the twosequences are the same. Of course, the term “same” is not limited tosequences that are absolutely identical. Sequences that are within thehomology parameters entered by the user will be marked as “same” in theprocess 200.

If a determination is made that the two sequences are the same, theprocess 200 moves to a state 214 wherein the name of the sequence fromthe database is displayed to the user. This state notifies the user thatthe sequence with the displayed name fulfills the homology constraintsthat were entered. Once the name of the stored sequence is displayed tothe user, the process 200 moves to a decision state 218 wherein adetermination is made whether more sequences exist in the database. Ifno more sequences exist in the database, then the process 200 terminatesat an end state 220. However, if more sequences do exist in thedatabase, then the process 200 moves to a state 224 wherein a pointer ismoved to the next sequence in the database so that it can be compared tothe new sequence. In this manner, the new sequence is aligned andcompared with every sequence in the database.

It should be noted that if a determination had been made at the decisionstate 212 that the sequences were not homologous, then the process 200would move immediately to the decision state 218 in order to determineif any other sequences were available in the database for comparison.

Accordingly, one aspect of the invention is a computer system comprisinga processor, a data storage device having stored thereon a nucleic acidsequence of the invention, or a polypeptide sequence of the invention, adata storage device having retrievably stored thereon referencenucleotide sequences or polypeptide sequences to be compared to anucleic acid sequence of the invention, or a polypeptide sequence of theinvention and a sequence comparer for conducting the comparison. Thesequence comparer may indicate a homology level between the sequencescompared or identify structural motifs in the above described nucleicacid code a nucleic acid sequence of the invention, or a polypeptidesequence of the invention, or it may identify structural motifs insequences which are compared to these nucleic acid codes and polypeptidecodes. In some aspects, the data storage device may have stored thereonthe sequences of at least 2, 5, 10, 15, 20, 25, 30 or 40 or more of thenucleic acid sequences of the invention, or the polypeptide sequences ofthe invention.

Another aspect of the invention is a method for determining the level ofhomology between a nucleic acid sequence of the invention, or apolypeptide sequence of the invention and a reference nucleotidesequence. The method including reading the nucleic acid code or thepolypeptide code and the reference nucleotide or polypeptide sequencethrough the use of a computer program which determines homology levelsand determining homology between the nucleic acid code or polypeptidecode and the reference nucleotide or polypeptide sequence with thecomputer program. The computer program may be any of a number ofcomputer programs for determining homology levels, including thosespecifically enumerated herein, (e.g., BLAST2N with the defaultparameters or with any modified parameters). The method may beimplemented using the computer systems described above. The method mayalso be performed by reading at least 2, 5, 10, 15, 20, 25, 30 or 40 ormore of the above described nucleic acid sequences of the invention, orthe polypeptide sequences of the invention through use of the computerprogram and determining homology between the nucleic acid codes orpolypeptide codes and reference nucleotide sequences or polypeptidesequences.

FIG. 3 is a flow diagram illustrating one aspect of a process 250 in acomputer for determining whether two sequences are homologous. Theprocess 250 begins at a start state 252 and then moves to a state 254wherein a first sequence to be compared is stored to a memory. Thesecond sequence to be compared is then stored to a memory at a state256. The process 250 then moves to a state 260 wherein the firstcharacter in the first sequence is read and then to a state 262 whereinthe first character of the second sequence is read. It should beunderstood that if the sequence is a nucleotide sequence, then thecharacter would normally be either A, T, C, G or U. If the sequence is aprotein sequence, then it is in one aspect in the single letter aminoacid code so that the first and sequence sequences can be easilycompared.

A determination is then made at a decision state 264 whether the twocharacters are the same. If they are the same, then the process 250moves to a state 268 wherein the next characters in the first and secondsequences are read. A determination is then made whether the nextcharacters are the same. If they are, then the process 250 continuesthis loop until two characters are not the same. If a determination ismade that the next two characters are not the same, the process 250moves to a decision state 274 to determine whether there are any morecharacters either sequence to read.

If there are not any more characters to read, then the process 250 movesto a state 276 wherein the level of homology between the first andsecond sequences is displayed to the user. The level of homology isdetermined by calculating the proportion of characters between thesequences that were the same out of the total number of sequences in thefirst sequence. Thus, if every character in a first 100 nucleotidesequence aligned with a every character in a second sequence, thehomology level would be 100%.

Alternatively, the computer program may be a computer program whichcompares the nucleotide sequences of a nucleic acid sequence as setforth in the invention, to one or more reference nucleotide sequences inorder to determine whether the nucleic acid code of the invention,differs from a reference nucleic acid sequence at one or more positions.Optionally such a program records the length and identity of inserted,deleted or substituted nucleotides with respect to the sequence ofeither the reference polynucleotide or a nucleic acid sequence of theinvention. In one aspect, the computer program may be a program whichdetermines whether a nucleic acid sequence of the invention, contains asingle nucleotide polymorphism (SNP) with respect to a referencenucleotide sequence.

Accordingly, another aspect of the invention is a method for determiningwhether a nucleic acid sequence of the invention, differs at one or morenucleotides from a reference nucleotide sequence comprising the steps ofreading the nucleic acid code and the reference nucleotide sequencethrough use of a computer program which identifies differences betweennucleic acid sequences and identifying differences between the nucleicacid code and the reference nucleotide sequence with the computerprogram. In some aspects, the computer program is a program whichidentifies single nucleotide polymorphisms. The method may beimplemented by the computer systems described above and the methodillustrated in FIG. 3. The method may also be performed by reading atleast 2, 5, 10, 15, 20, 25, 30, or 40 or more of the nucleic acidsequences of the invention and the reference nucleotide sequencesthrough the use of the computer program and identifying differencesbetween the nucleic acid codes and the reference nucleotide sequenceswith the computer program.

In other aspects the computer based system may further comprise anidentifier for identifying features within a nucleic acid sequence ofthe invention or a polypeptide sequence of the invention. An“identifier” refers to one or more programs which identifies certainfeatures within a nucleic acid sequence of the invention, or apolypeptide sequence of the invention. In one aspect, the identifier maycomprise a program which identifies an open reading frame in a nucleicacid sequence of the invention.

FIG. 4 is a flow diagram illustrating one aspect of an identifierprocess 300 for detecting the presence of a feature in a sequence. Theprocess 300 begins at a start state 302 and then moves to a state 304wherein a first sequence that is to be checked for features is stored toa memory 115 in the computer system 100. The process 300 then moves to astate 306 wherein a database of sequence features is opened. Such adatabase would include a list of each feature's attributes along withthe name of the feature. For example, a feature name could be“Initiation Codon” and the attribute would be “ATG”. Another examplewould be the feature name “TAATAA Box” and the feature attribute wouldbe “TAATAA”. An example of such a database is produced by the Universityof Wisconsin Genetics Computer Group. Alternatively, the features may bestructural polypeptide motifs such as alpha helices, beta sheets, orfunctional polypeptide motifs such as enzymatic active sites,helix-turn-helix motifs or other motifs known to those skilled in theart.

Once the database of features is opened at the state 306, the process300 moves to a state 308 wherein the first feature is read from thedatabase. A comparison of the attribute of the first feature with thefirst sequence is then made at a state 310. A determination is then madeat a decision state 316 whether the attribute of the feature was foundin the first sequence. If the attribute was found, then the process 300moves to a state 318 wherein the name of the found feature is displayedto the user.

The process 300 then moves to a decision state 320 wherein adetermination is made whether move features exist in the database. If nomore features do exist, then the process 300 terminates at an end state324. However, if more features do exist in the database, then theprocess 300 reads the next sequence feature at a state 326 and loopsback to the state 310 wherein the attribute of the next feature iscompared against the first sequence. It should be noted, that if thefeature attribute is not found in the first sequence at the decisionstate 316, the process 300 moves directly to the decision state 320 inorder to determine if any more features exist in the database.

Accordingly, another aspect of the invention is a method of identifyinga feature within a nucleic acid sequence of the invention, or apolypeptide sequence of the invention, comprising reading the nucleicacid code(s) or polypeptide code(s) through the use of a computerprogram which identifies features therein and identifying featureswithin the nucleic acid code(s) with the computer program. In oneaspect, computer program comprises a computer program which identifiesopen reading frames. The method may be performed by reading a singlesequence or at least 2, 5, 10, 15, 20, 25, 30, or 40 or more of thenucleic acid sequences of the invention, or the polypeptide sequences ofthe invention, through the use of the computer program and identifyingfeatures within the nucleic acid codes or polypeptide codes with thecomputer program.

A nucleic acid sequence of the invention, or a polypeptide sequence ofthe invention, may be stored and manipulated in a variety of dataprocessor programs in a variety of formats. For example, a nucleic acidsequence of the invention, or a polypeptide sequence of the invention,may be stored as text in a word processing file, such as Microsoft WORD™or WORDPERFECT™ or as an ASCII file in a variety of database programsfamiliar to those of skill in the art, such as DB2™, SYBASE™, orORACLE™. In addition, many computer programs and databases may be usedas sequence comparison algorithms, identifiers, or sources of referencenucleotide sequences or polypeptide sequences to be compared to anucleic acid sequence of the invention, or a polypeptide sequence of theinvention. The following list is intended not to limit the invention butto provide guidance to programs and databases which are useful with thenucleic acid sequences of the invention, or the polypeptide sequences ofthe invention.

The programs and databases which may be used include, but are notlimited to: MACPATTERN™ (EMBL), DISCOVERYBASE™ (Molecular ApplicationsGroup), GENEMINE™ (Molecular Applications Group), LOOK™ (MolecularApplications Group), MACLOOK™ (Molecular Applications Group), BLAST andBLAST2 (NCBI), BLASTN and BLASTX (Altschul et al, J. Mol. Biol. 215:403, 1990), FASTA (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85:2444, 1988), FASTDB (Brutlag et al. Comp. App. Biosci. 6:237-245, 1990),CATALYST™ (Molecular Simulations Inc.), Catalyst/SHAPE™ (MolecularSimulations Inc.), Cerius². DBAccess™ (Molecular Simulations Inc.),HYPOGEN™ (Molecular Simulations Inc.), INSIGHT II™, (MolecularSimulations Inc.), DISCOVER™ (Molecular Simulations Inc.), CHARMm™(Molecular Simulations Inc.), FELIX™ (Molecular Simulations Inc.),DELPHI™, (Molecular Simulations Inc.), QuanteMM™, (Molecular SimulationsInc.), Homology (Molecular Simulations Inc.), MODELER™ (MolecularSimulations Inc.), ISIS™ (Molecular Simulations Inc.), Quanta/ProteinDesign (Molecular Simulations Inc.), WebLab (Molecular SimulationsInc.), WebLab Diversity Explorer (Molecular Simulations Inc.), GeneExplorer (Molecular Simulations Inc.), SeqFold (Molecular SimulationsInc.), the MDL Available Chemicals Directory database, the MDL Drug DataReport data base, the Comprehensive Medicinal Chemistry database,Derwents's World Drug Index database, the BioByteMasterFile database,the Genbank database and the Genseqn database. Many other programs anddata bases would be apparent to one of skill in the art given thepresent disclosure.

Motifs which may be detected using the above programs include sequencesencoding leucine zippers, helix-turn-helix motifs, glycosylation sites,ubiquitination sites, alpha helices and beta sheets, signal sequencesencoding signal peptides which direct the secretion of the encodedproteins, sequences implicated in transcription regulation such ashomeoboxes, acidic stretches, enzymatic active sites, substrate bindingsites and enzymatic cleavage sites.

Hybridization of Nucleic Acids

The invention provides isolated, synthetic or recombinant nucleic acidsthat hybridize under stringent conditions to an exemplary sequence ofthe invention (e.g., SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7,SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ IDNO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ IDNO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ IDNO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ IDNO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ IDNO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ IDNO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ IDNO:89, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ IDNO:99, SEQ ID NO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ ID NO:107, SEQID NO:109, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:115, SEQ ID NO:117,SEQ ID NO:119, SEQ ID NO:121, SEQ ID NO:123, SEQ ID NO:125, SEQ IDNO:127, SEQ ID NO:129, SEQ ID NO:131, SEQ ID NO:133, SEQ ID NO:135, SEQID NO:137, SEQ ID NO:139, SEQ ID NO:141, SEQ ID NO:143, SEQ ID NO:145,SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153, SEQ IDNO:155, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:161, SEQ ID NO:163, SEQID NO:165, SEQ ID NO:167, SEQ ID NO:169, SEQ ID NO:171, SEQ ID NO:173,SEQ ID NO:175, SEQ ID NO:177, SEQ ID NO:179, SEQ ID NO:181, SEQ IDNO:183, SEQ ID NO:185, SEQ ID NO:187, SEQ ID NO:189, SEQ ID NO:191, SEQID NO:193, SEQ ID NO:195, SEQ ID NO:197, SEQ ID NO:199, SEQ ID NO:201,SEQ ID NO:203, SEQ ID NO:205, SEQ ID NO:207, SEQ ID NO:209, SEQ IDNO:211, SEQ ID NO:213, SEQ ID NO:215, SEQ ID NO:217, SEQ ID NO:219, SEQID NO:221, SEQ ID NO:223, SEQ ID NO:225, SEQ ID NO:227, SEQ ID NO:229,SEQ ID NO:231, SEQ ID NO:233, SEQ ID NO:235, SEQ ID NO:237, SEQ IDNO:239, SEQ ID NO:241, SEQ ID NO:243, SEQ ID NO:245, SEQ ID NO:247, SEQID NO:249, SEQ ID NO:251, SEQ ID NO:253, SEQ ID NO:255, SEQ ID NO:257,SEQ ID NO:259, SEQ ID NO:261, SEQ ID NO:263, SEQ ID NO:265, SEQ IDNO:267, SEQ ID NO:269, SEQ ID NO:271, SEQ ID NO:273, SEQ ID NO:275, SEQID NO:277, SEQ ID NO:279, SEQ ID NO:281, SEQ ID NO:283, SEQ ID NO:285,SEQ ID NO:287, SEQ ID NO:289, SEQ ID NO:291, SEQ ID NO:293, SEQ IDNO:295, SEQ ID NO:297, SEQ ID NO:299, SEQ ID NO:301, SEQ ID NO:303, SEQID NO:305, SEQ ID NO:307, SEQ ID NO:309, SEQ ID NO:311, SEQ ID NO:313,SEQ ID NO:315, SEQ ID NO:317, SEQ ID NO:319, SEQ ID NO:321, SEQ IDNO:323, SEQ ID NO:325, SEQ ID NO:327, SEQ ID NO:329, SEQ ID NO:331, SEQID NO:333, SEQ ID NO:335, SEQ ID NO:337, SEQ ID NO:339, SEQ ID NO:341,SEQ ID NO:343, SEQ ID NO:345, SEQ ID NO:347, SEQ ID NO:349, SEQ IDNO:351, SEQ ID NO:353, SEQ ID NO:355, SEQ ID NO:357, SEQ ID NO:359, SEQID NO:361, SEQ ID NO:363, SEQ ID NO:365, SEQ ID NO:367, SEQ ID NO:369,SEQ ID NO:371, SEQ ID NO:373, SEQ ID NO:375, SEQ ID NO:377, SEQ IDNO:379, SEQ ID NO:381, SEQ ID NO:383, SEQ ID NO:385, SEQ ID NO:387, SEQID NO:389, SEQ ID NO:391, SEQ ID NO:393, SEQ ID NO:395, SEQ ID NO:397,SEQ ID NO:399, SEQ ID NO:401, SEQ ID NO:403, SEQ ID NO:405, SEQ IDNO:407, SEQ ID NO:409, SEQ ID NO:411, SEQ ID NO:413, SEQ ID NO:415, SEQID NO:417, SEQ ID NO:419, SEQ ID NO:421, SEQ ID NO:423, SEQ ID NO:425,SEQ ID NO:427, SEQ ID NO:429, SEQ ID NO:431, SEQ ID NO:433, SEQ IDNO:435, SEQ ID NO:437, SEQ ID NO:439, SEQ ID NO:441, SEQ ID NO:443, SEQID NO:445, SEQ ID NO:447, SEQ ID NO:449, SEQ ID NO:451, SEQ ID NO:453,SEQ ID NO:455, SEQ ID NO:457, SEQ ID NO:459, SEQ ID NO:461, SEQ IDNO:463, SEQ ID NO:465, SEQ ID NO:467, SEQ ID NO:469, SEQ ID NO:471, SEQID NO:473, SEQ ID NO:475, SEQ ID NO:477, SEQ ID NO:479, SEQ ID NO:481,SEQ ID NO:483, SEQ ID NO:485, SEQ ID NO:487, SEQ ID NO:489, SEQ IDNO:491, SEQ ID NO:493, SEQ ID NO:495, SEQ ID NO:497, SEQ ID NO:499, SEQID NO:501, SEQ ID NO:503, SEQ ID NO:505, SEQ ID NO:507, SEQ ID NO:509,SEQ ID NO:511, SEQ ID NO:513, SEQ ID NO:515, SEQ ID NO:517, SEQ IDNO:519, SEQ ID NO:521 and/or SEQ ID NO:523 see also Tables 1, 2, and 3,Examples 1 and 4, below, and Sequence Listing)). The stringentconditions can be highly stringent conditions, medium stringentconditions and/or low stringent conditions, including the high andreduced stringency conditions described herein. In one aspect, it is thestringency of the wash conditions that set forth the conditions whichdetermine whether a nucleic acid is within the scope of the invention,as discussed below.

“Hybridization” refers to the process by which a nucleic acid strandjoins with a complementary strand through base pairing. Hybridizationreactions can be sensitive and selective so that a particular sequenceof interest can be identified even in samples in which it is present atlow concentrations. Suitably stringent conditions can be defined by, forexample, the concentrations of salt or formamide in the prehybridizationand hybridization solutions, or by the hybridization temperature and arewell known in the art. In alternative aspects, stringency can beincreased by reducing the concentration of salt, increasing theconcentration of formamide, or raising the hybridization temperature. Inalternative aspects, nucleic acids of the invention are defined by theirability to hybridize under various stringency conditions (e.g., high,medium, and low), as set forth herein.

In one aspect, hybridization under high stringency conditions compriseabout 50% formamide at about 37° C. to 42° C. In one aspect,hybridization conditions comprise reduced stringency conditions in about35% to 25% formamide at about 30° C. to 35° C. In one aspect,hybridization conditions comprise high stringency conditions, e.g., at42° C. in 50% formamide, 5×SSPE, 0.3% SDS and 200 ug/ml sheared anddenatured salmon sperm DNA. In one aspect, hybridization conditionscomprise these reduced stringency conditions, but in 35% formamide at areduced temperature of 35° C. The temperature range corresponding to aparticular level of stringency can be further narrowed by calculatingthe purine to pyrimidine ratio of the nucleic acid of interest andadjusting the temperature accordingly. Variations on the above rangesand conditions are well known in the art.

In alternative aspects, nucleic acids of the invention as defined bytheir ability to hybridize under stringent conditions can be betweenabout five residues and the full length of nucleic acid of theinvention; e.g., they can be at least 5, 10, 15, 20, 25, 30, 35, 40, 50,55, 60, 65, 70, 75, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500,550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, or more, residues inlength. Nucleic acids shorter than full length are also included. Thesenucleic acids can be useful as, e.g., hybridization probes, labelingprobes, PCR oligonucleotide probes, siRNA or miRNA (single or doublestranded), antisense or sequences encoding antibody binding peptides(epitopes), motifs, active sites and the like.

In one aspect, nucleic acids of the invention are defined by theirability to hybridize under high stringency comprises conditions of about50% formamide at about 37° C. to 42° C. In one aspect, nucleic acids ofthe invention are defined by their ability to hybridize under reducedstringency comprising conditions in about 35% to 25% formamide at about30° C. to 35° C.

Alternatively, nucleic acids of the invention are defined by theirability to hybridize under high stringency comprising conditions at 42°C. in 50% formamide, 5×SSPE, 0.3% SDS, and a repetitive sequenceblocking nucleic acid, such as cot-1 or salmon sperm DNA (e.g., 200ug/ml sheared and denatured salmon sperm DNA). In one aspect, nucleicacids of the invention are defined by their ability to hybridize underreduced stringency conditions comprising 35% or 40% formamide at areduced temperature of 35° C. or 42° C.

In nucleic acid hybridization reactions, the conditions used to achievea particular level of stringency will vary, depending on the nature ofthe nucleic acids being hybridized. For example, the length, degree ofcomplementarity, nucleotide sequence composition (e.g., GC v. ATcontent) and nucleic acid type (e.g., RNA v. DNA) of the hybridizingregions of the nucleic acids can be considered in selectinghybridization conditions. An additional consideration is whether one ofthe nucleic acids is immobilized, for example, on a filter.

Hybridization may be carried out under conditions of low stringency,moderate stringency or high stringency. As an example of nucleic acidhybridization, a polymer membrane containing immobilized denaturednucleic acids is first prehybridized for 30 minutes at 45° C. in asolution consisting of 0.9 M NaCl, 50 mM NaH₂PO₄, pH 7.0, 5.0 mMNa₂EDTA, 0.5% SDS, 10×Denhardt's and 0.5 mg/ml polyriboadenylic acid.Approximately 2×10⁷ cpm (specific activity 4-9×10⁸ cpm/ug) of ³²Pend-labeled oligonucleotide probe are then added to the solution. After12-16 hours of incubation, the membrane is washed for 30 minutes at roomtemperature in 1×SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1mM Na₂EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh1×SET at T_(m)−10° C. for the oligonucleotide probe. The membrane isthen exposed to auto-radiographic film for detection of hybridizationsignals. All of the foregoing hybridizations would be considered to beunder conditions of high stringency.

Following hybridization, a filter can be washed to remove anynon-specifically bound detectable probe. The stringency used to wash thefilters can also be varied depending on the nature of the nucleic acidsbeing hybridized, the length of the nucleic acids being hybridized, thedegree of complementarity, the nucleotide sequence composition (e.g., GCv. AT content) and the nucleic acid type (e.g., RNA v. DNA). Examples ofprogressively higher stringency condition washes are as follows: 2×SSC,0.1% SDS at room temperature for 15 minutes (low stringency); 0.1×SSC,0.5% SDS at room temperature for 30 minutes to 1 hour (moderatestringency); 0.1×SSC, 0.5% SDS for 15 to 30 minutes at between thehybridization temperature and 68° C. (high stringency); and 0.15M NaClfor 15 minutes at 72° C. (very high stringency). A final low stringencywash can be conducted in 0.1×SSC at room temperature. The examples aboveare merely illustrative of one set of conditions that can be used towash filters. One of skill in the art would know that there are numerousrecipes for different stringency washes. Some other examples are givenbelow.

In one aspect, hybridization conditions comprise a wash step comprisinga wash for 30 minutes at room temperature in a solution comprising 1×150mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na₂EDTA, 0.5% SDS,followed by a 30 minute wash in fresh solution.

Nucleic acids which have hybridized to the probe are identified byautoradiography or other conventional techniques.

The above procedures may be modified to identify nucleic acids havingdecreasing levels of sequence identity (homology) to the probe sequence.For example, to obtain nucleic acids of decreasing sequence identity(homology) to the detectable probe, less stringent conditions may beused. For example, the hybridization temperature may be decreased inincrements of 5° C. from 68° C. to 42° C. in a hybridization bufferhaving a Na+ concentration of approximately 1M. Following hybridization,the filter may be washed with 2×SSC, 0.5% SDS at the temperature ofhybridization. These conditions are considered to be “moderate”conditions above 50° C. and “low” conditions below 50° C. A specificexample of “moderate” hybridization conditions is when the abovehybridization is conducted at 55° C. A specific example of “lowstringency” hybridization conditions is when the above hybridization isconducted at 45° C.

Alternatively, the hybridization may be carried out in buffers, such as6×SSC, containing formamide at a temperature of 42° C. In this case, theconcentration of formamide in the hybridization buffer may be reduced in5% increments from 50% to 0% to identify clones having decreasing levelsof homology to the probe. Following hybridization, the filter may bewashed with 6×SSC, 0.5% SDS at 50° C. These conditions are considered tobe “moderate” conditions above 25% formamide and “low” conditions below25% formamide. A specific example of “moderate” hybridization conditionsis when the above hybridization is conducted at 30% formamide. Aspecific example of “low stringency” hybridization conditions is whenthe above hybridization is conducted at 10% formamide.

However, the selection of a hybridization format may not be critical—itis the stringency of the wash conditions that set forth the conditionswhich determine whether a nucleic acid is within the scope of theinvention. Wash conditions used to identify nucleic acids within thescope of the invention include, e.g.: a salt concentration of about 0.02molar at pH 7 and a temperature of at least about 50° C. or about 55° C.to about 60° C.; or, a salt concentration of about 0.15 M NaCl at 72° C.for about 15 minutes; or, a salt concentration of about 0.2×SSC at atemperature of at least about 50° C. or about 55° C. to about 60° C. forabout 15 to about 20 minutes; or, the hybridization complex is washedtwice with a solution with a salt concentration of about 2×SSCcontaining 0.1% SDS at room temperature for 15 minutes and then washedtwice by 0.1×SSC containing 0.1% SDS at 68° C. for 15 minutes; or,equivalent conditions. See Sambrook, Tijssen and Ausubel for adescription of SSC buffer and equivalent conditions.

These methods may be used to isolate or identify nucleic acids of theinvention. For example, the preceding methods may be used to isolate oridentify nucleic acids having a sequence with at least about 50%, 51%,52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%,66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%,80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,94%, 95%, 96%, 97%, 98%, 99%, or more sequence identity (homology) to anucleic acid sequence selected from the group consisting of one of thesequences of the invention, or fragments comprising at least about 10,15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500consecutive bases thereof and the sequences complementary thereto.Sequence identity (homology) may be measured using the alignmentalgorithm. For example, the homologous polynucleotides may have a codingsequence which is a naturally occurring allelic variant of one of thecoding sequences described herein. Such allelic variants may have asubstitution, deletion or addition of one or more nucleotides whencompared to the nucleic acids of the invention. Additionally, the aboveprocedures may be used to isolate nucleic acids which encodepolypeptides having at least about 99%, 95%, at least 90%, at least 85%,at least 80%, at least 75%, at least 70%, at least 65%, at least 60%, atleast 55%, or at least 50% sequence identity (homology) to a polypeptideof the invention, or fragments comprising at least 5, 10, 15, 20, 25,30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof asdetermined using a sequence alignment algorithm (e.g., such as the FASTAversion 3.0t78 algorithm with the default parameters).

Oligonucleotides Probes and Methods for Using Them

The invention also provides nucleic acid probes that can be used, e.g.,for identifying, amplifying, or isolating nucleic acids encoding apolypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme activity or fragments thereof or foridentifying cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme genes. In one aspect, the probe comprises atleast about 10 consecutive bases of a nucleic acid of the invention.Alternatively, a probe of the invention can be at least about 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 150 or about 10to 50, about 20 to 60 about 30 to 70, consecutive bases of a sequence ofa nucleic acid of the invention. The probes identify a nucleic acid bybinding and/or hybridization. The probes can be used in arrays of theinvention, see discussion below, including, e.g., capillary arrays. Theprobes of the invention can also be used to isolate other nucleic acidsor polypeptides.

The isolated, synthetic or recombinant nucleic acids of the invention,the sequences complementary thereto, or a fragment comprising at leastabout 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or500 consecutive bases of one of the sequences of the invention, or thesequences complementary thereto may also be used as probes to determinewhether a biological sample, such as a soil sample, contains an organismhaving a nucleic acid sequence of the invention or an organism fromwhich the nucleic acid was obtained. In such procedures, a biologicalsample potentially harboring the organism from which the nucleic acidwas isolated is obtained and nucleic acids are obtained from the sample.The nucleic acids are contacted with the probe under conditions whichpermit the probe to specifically hybridize to any complementarysequences from which are present therein.

Where necessary, conditions which permit the probe to specificallyhybridize to complementary sequences may be determined by placing theprobe in contact with complementary sequences from samples known tocontain the complementary sequence as well as control sequences which donot contain the complementary sequence. Hybridization conditions, suchas the salt concentration of the hybridization buffer, the formamideconcentration of the hybridization buffer, or the hybridizationtemperature, may be varied to identify conditions which allow the probeto hybridize specifically to complementary nucleic acids.

If the sample contains the organism from which the nucleic acid wasisolated, specific hybridization of the probe is then detected.Hybridization may be detected by labeling the probe with a detectableagent such as a radioactive isotope, a fluorescent dye or an enzymecapable of catalyzing the formation of a detectable product.

Many methods for using the labeled probes to detect the presence ofcomplementary nucleic acids in a sample are familiar to those skilled inthe art. These include Southern Blots, Northern Blots, colonyhybridization procedures and dot blots. Protocols for each of theseprocedures are provided in Ausubel et al. Current Protocols in MolecularBiology, John Wiley 503 Sons, Inc. (1997) and Sambrook et al., MolecularCloning: A Laboratory Manual 2nd Ed., Cold Spring Harbor LaboratoryPress (1989.

Alternatively, more than one probe (at least one of which is capable ofspecifically hybridizing to any complementary sequences which arepresent in the nucleic acid sample), may be used in an amplificationreaction to determine whether the sample contains an organism containinga nucleic acid sequence of the invention (e.g., an organism from whichthe nucleic acid was isolated). In one aspect, the probes compriseoligonucleotides. In one aspect, the amplification reaction may comprisea PCR reaction. PCR protocols are described in Ausubel and Sambrook,supra. Alternatively, the amplification may comprise a ligase chainreaction, 3SR, or strand displacement reaction. (See Barany, F., “TheLigase Chain Reaction in a PCR World”, PCR Methods and Applications1:5-16, 1991; E. Fahy et al., “Self-sustained Sequence Replication(3SR): An Isothermal Transcription-based Amplification SystemAlternative to PCR”, PCR Methods and Applications 1:25-33, 1991; andWalker G. T. et al., “Strand Displacement Amplification—an Isothermal invitro DNA Amplification Technique”, Nucleic Acid Research 20:1691-1696,1992). In such procedures, the nucleic acids in the sample are contactedwith the probes, the amplification reaction is performed and anyresulting amplification product is detected. The amplification productmay be detected by performing gel electrophoresis on the reactionproducts and staining the gel with an intercalator such as ethidiumbromide. Alternatively, one or more of the probes may be labeled with aradioactive isotope and the presence of a radioactive amplificationproduct may be detected by autoradiography after gel electrophoresis.

Probes derived from sequences near the ends of the sequences of theinvention, may also be used in chromosome walking procedures to identifyclones containing genomic sequences located adjacent to the sequences ofthe invention. Such methods allow the isolation of genes which encodeadditional proteins from the host organism.

In one aspect, the isolated, synthetic or recombinant nucleic acids ofthe invention, the sequences complementary thereto, or a fragmentcomprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200,300, 400, or 500 or more consecutive bases of one of the sequences ofthe invention, or the sequences complementary thereto are used as probesto identify and isolate related nucleic acids. In some aspects, therelated nucleic acids may be cDNAs or genomic DNAs from organisms otherthan the one from which the nucleic acid was isolated. For example, theother organisms may be related organisms. In such procedures, a nucleicacid sample is contacted with the probe under conditions which permitthe probe to specifically hybridize to related sequences. Hybridizationof the probe to nucleic acids from the related organism is then detectedusing any of the methods described above.

By varying the stringency of the hybridization conditions used toidentify nucleic acids, such as cDNAs or genomic DNAs, which hybridizeto the detectable probe, nucleic acids having different levels ofhomology to the probe can be identified and isolated. Stringency may bevaried by conducting the hybridization at varying temperatures below themelting temperatures of the probes. The melting temperature, T_(m), isthe temperature (under defined ionic strength and pH) at which 50% ofthe target sequence hybridizes to a perfectly complementary probe. Verystringent conditions are selected to be equal to or about 5° C. lowerthan the T_(m) for a particular probe. The melting temperature of theprobe may be calculated using the following formulas:

-   -   For probes between 14 and 70 nucleotides in length the melting        temperature (T_(m)) is calculated using the formula:        T_(m)=81.5+16.6(log [Na+])+0.41(fraction G+C)−(600/N) where N is        the length of the probe.    -   If the hybridization is carried out in a solution containing        formamide, the melting temperature may be calculated using the        equation: T_(m)=81.5+16.6(log [Na+])+0.41(fraction G+C)−(0.63%        formamide)−(600/N) where N is the length of the probe.

Prehybridization may be carried out in 6×SSC, 5×Denhardt's reagent, 0.5%SDS, 100 μg/ml denatured fragmented salmon sperm DNA or 6×SSC,5×Denhardt's reagent, 0.5% SDS, 100 μg/ml denatured fragmented salmonsperm DNA, 50% formamide. The formulas for SSC and Denhardt's solutionsare listed in Sambrook et al., supra.

In one aspect, hybridization is conducted by adding the detectable probeto the prehybridization solutions listed above. Where the probecomprises double stranded DNA, it is denatured before addition to thehybridization solution. In one aspect, the filter is contacted with thehybridization solution for a sufficient period of time to allow theprobe to hybridize to cDNAs or genomic DNAs containing sequencescomplementary thereto or homologous thereto. For probes over 200nucleotides in length, the hybridization may be carried out at 15-25° C.below the T_(m). For shorter probes, such as oligonucleotide probes, thehybridization may be conducted at 5-10° C. below the T_(m). In oneaspect, for hybridizations in 6×SSC, the hybridization is conducted atapproximately 68° C. Usually, for hybridizations in 50% formamidecontaining solutions, the hybridization is conducted at approximately42° C.

Inhibiting Expression of Cellulase Enzymes

The invention provides nucleic acids complementary to (e.g., antisensesequences to) the nucleic acids of the invention, e.g., cellulaseenzyme-encoding nucleic acids, e.g., nucleic acids comprising antisense,siRNA, miRNA, ribozymes. Nucleic acids of the invention comprisingantisense sequences can be capable of inhibiting the transport, splicingor transcription of cellulase enzyme-encoding genes. The inhibition canbe effected through the targeting of genomic DNA or messenger RNA. Thetranscription or function of targeted nucleic acid can be inhibited, forexample, by hybridization and/or cleavage. One exemplary set ofinhibitors provided by the present invention includes oligonucleotideswhich are able to either bind cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme gene or message, ineither case preventing or inhibiting the production or function of acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme. The association can be through sequence specifichybridization. Another useful class of inhibitors includesoligonucleotides which cause inactivation or cleavage of cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymemessage. The oligonucleotide can have enzyme activity which causes suchcleavage, such as ribozymes. The oligonucleotide can be chemicallymodified or conjugated to an enzyme or composition capable of cleavingthe complementary nucleic acid. A pool of many different sucholigonucleotides can be screened for those with the desired activity.Thus, the invention provides various compositions for the inhibition ofcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme expression on a nucleic acid and/or protein level,e.g., antisense, siRNA, miRNA and ribozymes comprising cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme sequencesof the invention and the anti-cellulase, e.g., anti-endoglucanase,anti-cellobiohydrolase and/or anti-beta-glucosidase antibodies of theinvention.

Inhibition of cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme expression can have a variety of industrialapplications. For example, inhibition of cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme expression can slow orprevent spoilage. In one aspect, use of compositions of the inventionthat inhibit the expression and/or activity of cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes, e.g.,antibodies, antisense oligonucleotides, ribozymes, siRNA and miRNA areused to slow or prevent spoilage. Thus, in one aspect, the inventionprovides methods and compositions comprising application onto a plant orplant product (e.g., a cereal, a grain, a fruit, seed, root, leaf, etc.)antibodies, antisense oligonucleotides, ribozymes, siRNA and miRNA ofthe invention to slow or prevent spoilage. These compositions also canbe expressed by the plant (e.g., a transgenic plant) or another organism(e.g., a bacterium or other microorganism transformed with a cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymegene of the invention).

The compositions of the invention for the inhibition of cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme expression(e.g., antisense, iRNA, ribozymes, antibodies) can be used aspharmaceutical compositions, e.g., as anti-pathogen agents or in othertherapies, e.g., as anti-microbials for, e.g., Salmonella.

Antisense Oligonucleotides

The invention provides antisense oligonucleotides capable of bindingcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme message which, in one aspect, can inhibit cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymeactivity by targeting mRNA. Strategies for designing antisenseoligonucleotides are well described in the scientific and patentliterature, and the skilled artisan can design such cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymeoligonucleotides using the novel reagents of the invention. For example,gene walking/RNA mapping protocols to screen for effective antisenseoligonucleotides are well known in the art, see, e.g., Ho (2000) MethodsEnzymol. 314:168-183, describing an RNA mapping assay, which is based onstandard molecular techniques to provide an easy and reliable method forpotent antisense sequence selection. See also Smith (2000) Eur. J.Pharm. Sci. 11:191-198.

Naturally occurring nucleic acids are used as antisenseoligonucleotides. The antisense oligonucleotides can be of any length;for example, in alternative aspects, the antisense oligonucleotides arebetween about 5 to 100, about 10 to 80, about 15 to 60, about 18 to 40.The optimal length can be determined by routine screening. The antisenseoligonucleotides can be present at any concentration. The optimalconcentration can be determined by routine screening. A wide variety ofsynthetic, non-naturally occurring nucleotide and nucleic acid analoguesare known which can address this potential problem. For example, peptidenucleic acids (PNAs) containing non-ionic backbones, such asN-(2-aminoethyl) glycine units can be used. Antisense oligonucleotideshaving phosphorothioate linkages can also be used, as described in WO97/03211; WO 96/39154; Mata (1997) Toxicol Appl Pharmacol 144:189-197;Antisense Therapeutics, ed. Agrawal (Humana Press, Totowa, N. J., 1996).Antisense oligonucleotides having synthetic DNA backbone analoguesprovided by the invention can also include phosphoro-dithioate,methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate,3′-thioacetal, methylene(methylimino), 3′-N-carbamate, and morpholinocarbamate nucleic acids, as described above.

Combinatorial chemistry methodology can be used to create vast numbersof oligonucleotides that can be rapidly screened for specificoligonucleotides that have appropriate binding affinities andspecificities toward any target, such as the sense and antisensecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme sequences of the invention (see, e.g., Gold (1995) J.of Biol. Chem. 270:13581-13584).

Inhibitory Ribozymes

The invention provides ribozymes capable of binding cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme message.These ribozymes can inhibit cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity by, e.g.,targeting mRNA. Strategies for designing ribozymes and selecting thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme-specific antisense sequence for targeting are welldescribed in the scientific and patent literature, and the skilledartisan can design such ribozymes using the novel reagents of theinvention. Ribozymes act by binding to a target RNA through the targetRNA binding portion of a ribozyme which is held in close proximity to anenzymatic portion of the RNA that cleaves the target RNA. Thus, theribozyme recognizes and binds a target RNA through complementarybase-pairing, and once bound to the correct site, acts enzymatically tocleave and inactivate the target RNA. Cleavage of a target RNA in such amanner will destroy its ability to direct synthesis of an encodedprotein if the cleavage occurs in the coding sequence. After a ribozymehas bound and cleaved its RNA target, it can be released from that RNAto bind and cleave new targets repeatedly.

In some circumstances, the enzymatic nature of a ribozyme can beadvantageous over other technologies, such as antisense technology(where a nucleic acid molecule simply binds to a nucleic acid target toblock its transcription, translation or association with anothermolecule) as the effective concentration of ribozyme necessary to effecta therapeutic treatment can be lower than that of an antisenseoligonucleotide. This potential advantage reflects the ability of theribozyme to act enzymatically. Thus, a single ribozyme molecule is ableto cleave many molecules of target RNA. In one aspect, a ribozyme is ahighly specific inhibitor, with the specificity of inhibition dependingnot only on the base pairing mechanism of binding, but also on themechanism by which the molecule inhibits the expression of the RNA towhich it binds. That is, the inhibition is caused by cleavage of the RNAtarget and so specificity is defined as the ratio of the rate ofcleavage of the targeted RNA over the rate of cleavage of non-targetedRNA. This cleavage mechanism is dependent upon factors additional tothose involved in base pairing. Thus, the specificity of action of aribozyme can be greater than that of antisense oligonucleotide bindingthe same RNA site.

The ribozyme of the invention, e.g., an enzymatic ribozyme RNA molecule,can be formed in a hammerhead motif, a hairpin motif, as a hepatitisdelta virus motif, a group I intron motif and/or an RNaseP-like RNA inassociation with an RNA guide sequence. Examples of hammerhead motifsare described by, e.g., Rossi (1992) Aids Research and HumanRetroviruses 8:183; hairpin motifs by Hampel (1989) Biochemistry28:4929, and Hampel (1990) Nuc. Acids Res. 18:299; the hepatitis deltavirus motif by Perrotta (1992) Biochemistry 31:16; the RNaseP motif byGuerrier-Takada (1983) Cell 35:849; and the group I intron by Cech U.S.Pat. No. 4,987,071. The recitation of these specific motifs is notintended to be limiting. Those skilled in the art will recognize that aribozyme of the invention, e.g., an enzymatic RNA molecule of thisinvention, can have a specific substrate binding site complementary toone or more of the target gene RNA regions. A ribozyme of the inventioncan have a nucleotide sequence within or surrounding that substratebinding site which imparts an RNA cleaving activity to the molecule.

RNA Interference (RNAi)

In one aspect, the invention provides an RNA inhibitory molecule, aso-called “RNAi” molecule, comprising a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme sequence of theinvention. The RNAi molecule can comprise a double-stranded RNA (dsRNA)molecule, e.g., siRNA and/or miRNA. The RNAi molecule, e.g., siRNAand/or miRNA, can inhibit expression of a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme gene. Inone aspect, the RNAi molecule, e.g., siRNA and/or miRNA, is about 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex nucleotides inlength. While the invention is not limited by any particular mechanismof action, the RNAi can enter a cell and cause the degradation of asingle-stranded RNA (ssRNA) of similar or identical sequences, includingendogenous mRNAs. When a cell is exposed to double-stranded RNA (dsRNA),mRNA from the homologous gene is selectively degraded by a processcalled RNA interference (RNAi). A possible basic mechanism behind RNAiis the breaking of a double-stranded RNA (dsRNA) matching a specificgene sequence into short pieces called short interfering RNA, whichtrigger the degradation of mRNA that matches its sequence. In oneaspect, the RNAi's of the invention are used in gene-silencingtherapeutics, see, e.g., Shuey (2002) Drug Discov. Today 7:1040-1046. Inone aspect, the invention provides methods to selectively degrade RNAusing the RNAi's molecules, e.g., siRNA and/or miRNA, of the invention.The process may be practiced in vitro, ex vivo or in vivo. In oneaspect, the RNAi molecules of the invention can be used to generate aloss-of-function mutation in a cell, an organ or an animal. Methods formaking and using RNAi molecules, e.g., siRNA and/or miRNA, forselectively degrade RNA are well known in the art, see, e.g., U.S. Pat.Nos. 6,506,559; 6,511,824; 6,515,109; 6,489,127.

Modification of Nucleic Acids—Making Variant Enzymes of the Invention

The invention provides methods of generating variants of the nucleicacids of the invention, e.g., those encoding a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme. Thesemethods can be repeated or used in various combinations to generatecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes having an altered or different activity or analtered or different stability from that of a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme encoded bythe template nucleic acid. These methods also can be repeated or used invarious combinations, e.g., to generate variations in gene/messageexpression, message translation or message stability. In another aspect,the genetic composition of a cell is altered by, e.g., modification of ahomologous gene ex vivo, followed by its reinsertion into the cell.

A nucleic acid of the invention can be altered by any means. Forexample, random or stochastic methods, or, non-stochastic, or “directedevolution,” methods, see, e.g., U.S. Pat. No. 6,361,974. Methods forrandom mutation of genes are well known in the art, see, e.g., U.S. Pat.No. 5,830,696. For example, mutagens can be used to randomly mutate agene. Mutagens include, e.g., ultraviolet light or gamma irradiation, ora chemical mutagen, e.g., mitomycin, nitrous acid, photoactivatedpsoralens, alone or in combination, to induce DNA breaks amenable torepair by recombination. Other chemical mutagens include, for example,sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or formic acid.Other mutagens are analogues of nucleotide precursors, e.g.,nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine. Theseagents can be added to a PCR reaction in place of the nucleotideprecursor thereby mutating the sequence. Intercalating agents such asproflavine, acriflavine, quinacrine and the like can also be used.

Any technique in molecular biology can be used, e.g., random PCRmutagenesis, see, e.g., Rice (1992) Proc. Natl. Acad. Sci. USA89:5467-5471; or, combinatorial multiple cassette mutagenesis, see,e.g., Crameri (1995) Biotechniques 18:194-196. Alternatively, nucleicacids, e.g., genes, can be reassembled after random, or “stochastic,”fragmentation, see, e.g., U.S. Pat. Nos. 6,291,242; 6,287,862;6,287,861; 5,955,358; 5,830,721; 5,824,514; 5,811,238; 5,605,793. Inalternative aspects, modifications, additions or deletions areintroduced by error-prone PCR, shuffling, oligonucleotide-directedmutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo mutagenesis,cassette mutagenesis, recursive ensemble mutagenesis, exponentialensemble mutagenesis, site-specific mutagenesis, gene reassembly, GeneSite Saturation Mutagenesis (GSSM), synthetic ligation reassembly (SLR),recombination, recursive sequence recombination, phosphothioate-modifiedDNA mutagenesis, uracil-containing template mutagenesis, gapped duplexmutagenesis, point mismatch repair mutagenesis, repair-deficient hoststrain mutagenesis, chemical mutagenesis, radiogenic mutagenesis,deletion mutagenesis, restriction-selection mutagenesis,restriction-purification mutagenesis, artificial gene synthesis,ensemble mutagenesis, chimeric nucleic acid multimer creation,Chromosomal Saturation Mutagenesis (CSM) and/or a combination of theseand other methods.

The following publications describe a variety of recursive recombinationprocedures and/or methods which can be incorporated into the methods ofthe invention: Stemmer (1999) “Molecular breeding of viruses fortargeting and other clinical properties” Tumor Targeting 4:1-4; Ness(1999) Nature Biotechnology 17:893-896; Chang (1999) “Evolution of acytokine using DNA family shuffling” Nature Biotechnology 17:793-797;Minshull (1999) “Protein evolution by molecular breeding” CurrentOpinion in Chemical Biology 3:284-290; Christians (1999) “Directedevolution of thymidine kinase for AZT phosphorylation using DNA familyshuffling” Nature Biotechnology 17:259-264; Crameri (1998) “DNAshuffling of a family of genes from diverse species accelerates directedevolution” Nature 391:288-291; Crameri (1997) “Molecular evolution of anarsenate detoxification pathway by DNA shuffling,” Nature Biotechnology15:436-438; Zhang (1997) “Directed evolution of an effective fucosidasefrom a galactosidase by DNA shuffling and screening” Proc. Natl. Acad.Sci. USA 94:4504-4509; Patten et al. (1997) “Applications of DNAShuffling to Pharmaceuticals and Vaccines” Current Opinion inBiotechnology 8:724-733; Crameri et al. (1996) “Construction andevolution of antibody-phage libraries by DNA shuffling” Nature Medicine2:100-103; Gates et al. (1996) “Affinity selective isolation of ligandsfrom peptide libraries through display on a lac repressor ‘headpiecedimer’” Journal of Molecular Biology 255:373-386; Stemmer (1996) “SexualPCR and Assembly PCR” In: The Encyclopedia of Molecular Biology. VCHPublishers, New York. pp. 447-457; Crameri and Stemmer (1995)“Combinatorial multiple cassette mutagenesis creates all thepermutations of mutant and wildtype cassettes” BioTechniques 18:194-195;Stemmer et al. (1995) “Single-step assembly of a gene and entire plasmidform large numbers of oligodeoxyribonucleotides” Gene, 164:49-53;Stemmer (1995) “The Evolution of Molecular Computation” Science 270:1510; Stemmer (1995) “Searching Sequence Space” Bio/Technology13:549-553; Stemmer (1994) “Rapid evolution of a protein in vitro by DNAshuffling” Nature 370:389-391; and Stemmer (1994) “DNA shuffling byrandom fragmentation and reassembly: In vitro recombination formolecular evolution.” Proc. Natl. Acad. Sci. USA 91:10747-10751.

Mutational methods of generating diversity include, for example,site-directed mutagenesis (Ling et al. (1997) “Approaches to DNAmutagenesis: an overview” Anal Biochem. 254(2): 157-178; Dale et al.(1996) “Oligonucleotide-directed random mutagenesis using thephosphorothioate method” Methods Mol. Biol. 57:369-374; Smith (1985) “Invitro mutagenesis” Ann. Rev. Genet. 19:423-462; Botstein & Shortie(1985) “Strategies and applications of in vitro mutagenesis” Science229:1193-1201; Carter (1986) “Site-directed mutagenesis” Biochem. J.237:1-7; and Kunkel (1987) “The efficiency of oligonucleotide directedmutagenesis” in Nucleic Acids & Molecular Biology (Eckstein, F. andLilley, D. M. J. eds., Springer Verlag, Berlin)); mutagenesis usinguracil containing templates (Kunkel (1985) “Rapid and efficientsite-specific mutagenesis without phenotypic selection” Proc. Natl.Acad. Sci. USA 82:488-492; Kunkel et al. (1987) “Rapid and efficientsite-specific mutagenesis without phenotypic selection” Methods inEnzymol. 154, 367-382; and Bass et al. (1988) “Mutant Trp repressorswith new DNA-binding specificities” Science 242:240-245);oligonucleotide-directed mutagenesis (Methods in Enzymol. 100: 468-500(1983); Methods in Enzymol. 154: 329-350 (1987); Zoller (1982)“Oligonucleotide-directed mutagenesis using M13-derived vectors: anefficient and general procedure for the production of point mutations inany DNA fragment” Nucleic Acids Res. 10:6487-6500; Zoller & Smith (1983)“Oligonucleotide-directed mutagenesis of DNA fragments cloned into M13vectors” Methods in Enzymol. 100:468-500; and Zoller (1987)Oligonucleotide-directed mutagenesis: a simple method using twooligonucleotide primers and a single-stranded DNA template” Methods inEnzymol. 154:329-350); phosphorothioate-modified DNA mutagenesis (Taylor(1985) “The use of phosphorothioate-modified DNA in restriction enzymereactions to prepare nicked DNA” Nucl. Acids Res. 13: 8749-8764; Taylor(1985) “The rapid generation of oligonucleotide-directed mutations athigh frequency using phosphorothioate-modified DNA” Nucl. Acids Res. 13:8765-8787 (1985); Nakamaye (1986) “Inhibition of restrictionendonuclease Nci I cleavage by phosphorothioate groups and itsapplication to oligonucleotide-directed mutagenesis” Nucl. Acids Res.14: 9679-9698; Sayers (1988) “Y-T Exonucleases in phosphorothioate-basedoligonucleotide-directed mutagenesis” Nucl. Acids Res. 16:791-802; andSayers et al. (1988) “Strand specific cleavage ofphosphorothioate-containing DNA by reaction with restrictionendonucleases in the presence of ethidium bromide” Nucl. Acids Res. 16:803-814); mutagenesis using gapped duplex DNA (Kramer et al. (1984) “Thegapped duplex DNA approach to oligonucleotide-directed mutationconstruction” Nucl. Acids Res. 12: 9441-9456; Kramer & Fritz (1987)Methods in Enzymol. “Oligonucleotide-directed construction of mutationsvia gapped duplex DNA” 154:350-367; Kramer (1988) “Improved enzymatic invitro reactions in the gapped duplex DNA approach tooligonucleotide-directed construction of mutations” Nucl. Acids Res. 16:7207; and Fritz (1988) “Oligonucleotide-directed construction ofmutations: a gapped duplex DNA procedure without enzymatic reactions invitro” Nucl. Acids Res. 16: 6987-6999).

Additional protocols that can be used to practice the invention includepoint mismatch repair (Kramer (1984) “Point Mismatch Repair” Cell38:879-887), mutagenesis using repair-deficient host strains (Carter etal. (1985) “Improved oligonucleotide site-directed mutagenesis using M13vectors” Nucl. Acids Res. 13: 4431-4443; and Carter (1987) “Improvedoligonucleotide-directed mutagenesis using M13 vectors” Methods inEnzymol. 154: 382-403), deletion mutagenesis (Eghtedarzadeh (1986) “Useof oligonucleotides to generate large deletions” Nucl. Acids Res. 14:5115), restriction-selection and restriction-selection andrestriction-purification (Wells et al. (1986) “Importance ofhydrogen-bond formation in stabilizing the transition state ofsubtilisin” Phil. Trans. R. Soc. Lond. A 317: 415-423), mutagenesis bytotal gene synthesis (Nambiar et al. (1984) “Total synthesis and cloningof a gene coding for the ribonuclease S protein” Science 223: 1299-1301;Sakamar and Khorana (1988) “Total synthesis and expression of a gene forthe a-subunit of bovine rod outer segment guanine nucleotide-bindingprotein (transducin)” Nucl. Acids Res. 14: 6361-6372; Wells et al.(1985) “Cassette mutagenesis: an efficient method for generation ofmultiple mutations at defined sites” Gene 34:315-323; and Grundstrom etal. (1985) “Oligonucleotide-directed mutagenesis by microscale‘shot-gun’ gene synthesis” Nucl. Acids Res. 13: 3305-3316),double-strand break repair (Mandecki (1986); Arnold (1993) “Proteinengineering for unusual environments” Current Opinion in Biotechnology4:450-455. “Oligonucleotide-directed double-strand break repair inplasmids of Escherichia coli: a method for site-specific mutagenesis”Proc. Natl. Acad. Sci. USA, 83:7177-7181). Additional details on many ofthe above methods can be found in Methods in Enzymology Volume 154,which also describes useful controls for trouble-shooting problems withvarious mutagenesis methods.

Protocols that can be used to practice the invention are described,e.g., in U.S. Pat. No. 5,605,793 to Stemmer (Feb. 25, 1997), “Methodsfor In Vitro Recombination;” U.S. Pat. No. 5,811,238 to Stemmer et al.(Sep. 22, 1998) “Methods for Generating Polynucleotides having DesiredCharacteristics by Iterative Selection and Recombination;” U.S. Pat. No.5,830,721 to Stemmer et al. (Nov. 3, 1998), “DNA Mutagenesis by RandomFragmentation and Reassembly;” U.S. Pat. No. 5,834,252 to Stemmer, etal. (Nov. 10, 1998) “End-Complementary Polymerase Reaction;” U.S. Pat.No. 5,837,458 to Minshull, et al. (Nov. 17, 1998), “Methods andCompositions for Cellular and Metabolic Engineering;” WO 95/22625,Stemmer and Crameri, “Mutagenesis by Random Fragmentation andReassembly;” WO 96/33207 by Stemmer and Lipschutz “End ComplementaryPolymerase Chain Reaction;” WO 97/20078 by Stemmer and Crameri “Methodsfor Generating Polynucleotides having Desired Characteristics byIterative Selection and Recombination;” WO 97/35966 by Minshull andStemmer, “Methods and Compositions for Cellular and MetabolicEngineering;” WO 99/41402 by Punnonen et al. “Targeting of GeneticVaccine Vectors;” WO 99/41383 by Punnonen et al. “Antigen LibraryImmunization;” WO 99/41369 by Punnonen et al. “Genetic Vaccine VectorEngineering;” WO 99/41368 by Punnonen et al. “Optimization ofImmunomodulatory Properties of Genetic Vaccines;” EP 752008 by Stemmerand Crameri, “DNA Mutagenesis by Random Fragmentation and Reassembly;”EP 0932670 by Stemmer “Evolving Cellular DNA Uptake by RecursiveSequence Recombination;” WO 99/23107 by Stemmer et al., “Modification ofVirus Tropism and Host Range by Viral Genome Shuffling;” WO 99/21979 byApt et al., “Human Papillomavirus Vectors;” WO 98/31837 by del Cardayreet al. “Evolution of Whole Cells and Organisms by Recursive SequenceRecombination;” WO 98/27230 by Patten and Stemmer, “Methods andCompositions for Polypeptide Engineering;” WO 98/27230 by Stemmer etal., “Methods for Optimization of Gene Therapy by Recursive SequenceShuffling and Selection,” WO 00/00632, “Methods for Generating HighlyDiverse Libraries,” WO 00/09679, “Methods for Obtaining in VitroRecombined Polynucleotide Sequence Banks and Resulting Sequences,” WO98/42832 by Arnold et al., “Recombination of Polynucleotide SequencesUsing Random or Defined Primers,” WO 99/29902 by Arnold et al., “Methodfor Creating Polynucleotide and Polypeptide Sequences,” WO 98/41653 byVind, “An in Vitro Method for Construction of a DNA Library,” WO98/41622 by Borchert et al., “Method for Constructing a Library UsingDNA Shuffling,” and WO 98/42727 by Pati and Zarling, “SequenceAlterations using Homologous Recombination.”

Protocols that can be used to practice the invention (providing detailsregarding various diversity generating methods) are described, e.g., inU.S. patent application Ser. No. 09/407,800, “SHUFFLING OF CODON ALTEREDGENES” by Patten et al. filed Sep. 28, 1999; “EVOLUTION OF WHOLE CELLSAND ORGANISMS BY RECURSIVE SEQUENCE RECOMBINATION” by del Cardayre etal., U.S. Pat. No. 6,379,964; “OLIGONUCLEOTIDE MEDIATED NUCLEIC ACIDRECOMBINATION” by Crameri et al., U.S. Pat. Nos. 6,319,714; 6,368,861;6,376,246; 6,423,542; 6,426,224 and PCT/US00/01203; “USE OF CODON-VARIEDOLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING” by Welch et al., U.S.Pat. No. 6,436,675; “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., filed Jan. 18, 2000, (PCT/US00/01202) and, e.g.,“METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDESHAVING DESIRED CHARACTERISTICS” by Selifonov et al., filed Jul. 18, 2000(U.S. Ser. No. 09/618,579); “METHODS OF POPULATING DATA STRUCTURES FORUSE IN EVOLUTIONARY SIMULATIONS” by Selifonov and Stemmer, filed Jan.18, 2000 (PCT/US00/01138); and “SINGLE-STRANDED NUCLEIC ACIDTEMPLATE-MEDIATED RECOMBINATION AND NUCLEIC ACID FRAGMENT ISOLATION” byAffholter, filed Sep. 6, 2000 (U.S. Ser. No. 09/656,549); and U.S. Pat.Nos. 6,177,263; 6,153,410.

Non-stochastic, or “directed evolution,” methods include, e.g.,saturation mutagenesis, such as Gene Site Saturation Mutagenesis (GSSM),synthetic ligation reassembly (SLR), or a combination thereof are usedto modify the nucleic acids of the invention to generate cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymeswith new or altered properties (e.g., activity under highly acidic oralkaline conditions, high or low temperatures, and the like).Polypeptides encoded by the modified nucleic acids can be screened foran activity before testing for glucan hydrolysis or other activity. Anytesting modality or protocol can be used, e.g., using a capillary arrayplatform. See, e.g., U.S. Pat. Nos. 6,361,974; 6,280,926; 5,939,250.

Gene Site Saturation Mutagenesis, or, GSSM

The invention also provides methods for making enzyme using Gene SiteSaturation mutagenesis, or, GSSM, as described herein, and also in U.S.Pat. Nos. 6,171,820 and 6,579,258. FIG. 11 is a diagram illustrating theuse of a gene site-saturation mutagenesis (GSSM) approach for achievingall possible amino acid changes at each amino acid site along thepolypeptide. The oligos used are comprised of a homologous sequence, atriplet sequence composed of degenerate N,N, G/T, and another homologoussequence. Thus, the degeneracy of each oligo is derived from thedegeneracy of the N,N, G/T cassette contained therein. The resultantpolymerization products from the use of such oligos include all possibleamino acid changes at each amino acid site along the polypeptide,because the N,N, G/T sequence is able to code for all 20 amino acids. Asshown, a separate degenerate oligo is used for mutagenizing each codonin a polynucleotide encoding a polypeptide.

In one aspect, codon primers containing a degenerate N,N,G/T sequenceare used to introduce point mutations into a polynucleotide, e.g., acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme or an antibody of the invention, so as to generate aset of progeny polypeptides in which a full range of single amino acidsubstitutions is represented at each amino acid position, e.g., an aminoacid residue in an enzyme active site or ligand binding site targeted tobe modified. These oligonucleotides can comprise a contiguous firsthomologous sequence, a degenerate N,N,G/T sequence, and, optionally, asecond homologous sequence. The downstream progeny translationalproducts from the use of such oligonucleotides include all possibleamino acid changes at each amino acid site along the polypeptide,because the degeneracy of the N,N,G/T sequence includes codons for all20 amino acids. In one aspect, one such degenerate oligonucleotide(comprised of, e.g., one degenerate N,N,G/T cassette) is used forsubjecting each original codon in a parental polynucleotide template toa full range of codon substitutions. In another aspect, at least twodegenerate cassettes are used—either in the same oligonucleotide or not,for subjecting at least two original codons in a parental polynucleotidetemplate to a full range of codon substitutions. For example, more thanone N,N,G/T sequence can be contained in one oligonucleotide tointroduce amino acid mutations at more than one site. This plurality ofN,N,G/T sequences can be directly contiguous, or separated by one ormore additional nucleotide sequence(s). In another aspect,oligonucleotides serviceable for introducing additions and deletions canbe used either alone or in combination with the codons containing anN,N,G/T sequence, to introduce any combination or permutation of aminoacid additions, deletions, and/or substitutions.

In one aspect, simultaneous mutagenesis of two or more contiguous aminoacid positions is done using an oligonucleotide that contains contiguousN,N,G/T triplets, i.e., a degenerate (N,N,G/T)n sequence. In anotheraspect, degenerate cassettes having less degeneracy than the N,N,G/Tsequence are used. For example, it may be desirable in some instances touse (e.g., in an oligonucleotide) a degenerate triplet sequencecomprised of only one N, where said N can be in the first second orthird position of the triplet. Any other bases including anycombinations and permutations thereof can be used in the remaining twopositions of the triplet. Alternatively, it may be desirable in someinstances to use (e.g., in an oligo) a degenerate N,N,N tripletsequence.

In one aspect, use of degenerate triplets (e.g., N,N,G/T triplets)allows for systematic and easy generation of a full range of possiblenatural amino acids (for a total of 20 amino acids) into each and everyamino acid position in a polypeptide (in alternative aspects, themethods also include generation of less than all possible substitutionsper amino acid residue, or codon, position). For example, for a 100amino acid polypeptide, 2000 distinct species (i.e., 20 possible aminoacids per position×100 amino acid positions) can be generated. Throughthe use of an oligonucleotide or set of oligonucleotides containing adegenerate N,N,G/T triplet, 32 individual sequences can code for all 20possible natural amino acids. Thus, in a reaction vessel in which aparental polynucleotide sequence is subjected to saturation mutagenesisusing at least one such oligonucleotide, there are generated 32 distinctprogeny polynucleotides encoding 20 distinct polypeptides. In contrast,the use of a non-degenerate oligonucleotide in site-directed mutagenesisleads to only one progeny polypeptide product per reaction vessel.Nondegenerate oligonucleotides can optionally be used in combinationwith degenerate primers disclosed; for example, nondegenerateoligonucleotides can be used to generate specific point mutations in aworking polynucleotide. This provides one means to generate specificsilent point mutations, point mutations leading to corresponding aminoacid changes, and point mutations that cause the generation of stopcodons and the corresponding expression of polypeptide fragments.

In one aspect, each saturation mutagenesis reaction vessel containspolynucleotides encoding at least 20 progeny polypeptide (e.g.,cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes) molecules such that all 20 natural amino acids arerepresented at the one specific amino acid position corresponding to thecodon position mutagenized in the parental polynucleotide (other aspectsuse less than all 20 natural combinations). The 32-fold degenerateprogeny polypeptides generated from each saturation mutagenesis reactionvessel can be subjected to clonal amplification (e.g., cloned into asuitable host, e.g., E. coli host, using, e.g., an expression vector)and subjected to expression screening. When an individual progenypolypeptide is identified by screening to display a favorable change inproperty (when compared to the parental polypeptide, such as increasedglucan hydrolysis activity under alkaline or acidic conditions), it canbe sequenced to identify the correspondingly favorable amino acidsubstitution contained therein.

In one aspect, upon mutagenizing each and every amino acid position in aparental polypeptide using saturation mutagenesis as disclosed herein,favorable amino acid changes may be identified at more than one aminoacid position. One or more new progeny molecules can be generated thatcontain a combination of all or part of these favorable amino acidsubstitutions. For example, if 2 specific favorable amino acid changesare identified in each of 3 amino acid positions in a polypeptide, thepermutations include 3 possibilities at each position (no change fromthe original amino acid, and each of two favorable changes) and 3positions. Thus, there are 3×3×3 or 27 total possibilities, including 7that were previously examined—6 single point mutations (i.e., 2 at eachof three positions) and no change at any position.

In yet another aspect, site-saturation mutagenesis can be used togetherwith shuffling, chimerization, recombination and other mutagenizingprocesses, along with screening. This invention provides for the use ofany mutagenizing process(es), including saturation mutagenesis, in aniterative manner. In one exemplification, the iterative use of anymutagenizing process(es) is used in combination with screening.

The invention also provides for the use of proprietary codon primers(containing a degenerate N,N,N sequence) to introduce point mutationsinto a polynucleotide, so as to generate a set of progeny polypeptidesin which a full range of single amino acid substitutions is representedat each amino acid position (Gene Site Saturation Mutagenesis (GSSM)).The oligos used are comprised contiguously of a first homologoussequence, a degenerate N,N,N sequence and in one aspect but notnecessarily a second homologous sequence. The downstream progenytranslational products from the use of such oligos include all possibleamino acid changes at each amino acid site along the polypeptide,because the degeneracy of the N,N,N sequence includes codons for all 20amino acids.

In one aspect, one such degenerate oligo (comprised of one degenerateN,N,N cassette) is used for subjecting each original codon in a parentalpolynucleotide template to a full range of codon substitutions. Inanother aspect, at least two degenerate N,N,N cassettes are used—eitherin the same oligo or not, for subjecting at least two original codons ina parental polynucleotide template to a full range of codonsubstitutions. Thus, more than one N,N,N sequence can be contained inone oligo to introduce amino acid mutations at more than one site. Thisplurality of N,N,N sequences can be directly contiguous, or separated byone or more additional nucleotide sequence(s). In another aspect, oligosserviceable for introducing additions and deletions can be used eitheralone or in combination with the codons containing an N,N,N sequence, tointroduce any combination or permutation of amino acid, additions,deletions and/or substitutions.

In one aspect, it is possible to simultaneously mutagenize two or morecontiguous amino acid positions using an oligo that contains contiguousN,N,N triplets, i.e., a degenerate (N,N,N)_(n) sequence. In anotheraspect, the present invention provides for the use of degeneratecassettes having less degeneracy than the N,N,N sequence. For example,it may be desirable in some instances to use (e.g., in an oligo) adegenerate triplet sequence comprised of only one N, where the N can bein the first second or third position of the triplet. Any other basesincluding any combinations and permutations thereof can be used in theremaining two positions of the triplet. Alternatively, it may bedesirable in some instances to use (e.g., in an oligo) a degenerateN,N,N triplet sequence, N,N,G/T, or an N,N, G/C triplet sequence.

In one aspect, use of a degenerate triplet (such as N,N,G/T or an N,N,G/C triplet sequence) is advantageous for several reasons. In oneaspect, this invention provides a means to systematically and fairlyeasily generate the substitution of the full range of possible aminoacids (for a total of 20 amino acids) into each and every amino acidposition in a polypeptide. Thus, for a 100 amino acid polypeptide, theinvention provides a way to systematically and fairly easily generate2000 distinct species (i.e., 20 possible amino acids per position times100 amino acid positions). It is appreciated that there is provided,through the use of an oligo containing a degenerate N,N,G/T or an N,N,G/C triplet sequence, 32 individual sequences that code for 20 possibleamino acids. Thus, in a reaction vessel in which a parentalpolynucleotide sequence is subjected to saturation mutagenesis using onesuch oligo, there are generated 32 distinct progeny polynucleotidesencoding 20 distinct polypeptides. In contrast, the use of anon-degenerate oligo in site-directed mutagenesis leads to only oneprogeny polypeptide product per reaction vessel.

This invention also provides for the use of nondegenerate oligos, whichcan optionally be used in combination with degenerate primers disclosed.It is appreciated that in some situations, it is advantageous to usenondegenerate oligos to generate specific point mutations in a workingpolynucleotide. This provides a means to generate specific silent pointmutations, point mutations leading to corresponding amino acid changesand point mutations that cause the generation of stop codons and thecorresponding expression of polypeptide fragments.

Thus, in one aspect of this invention, each saturation mutagenesisreaction vessel contains polynucleotides encoding at least 20 progenypolypeptide molecules such that all 20 amino acids are represented atthe one specific amino acid position corresponding to the codon positionmutagenized in the parental polynucleotide. The 32-fold degenerateprogeny polypeptides generated from each saturation mutagenesis reactionvessel can be subjected to clonal amplification (e.g., cloned into asuitable E. coli host using an expression vector) and subjected toexpression screening. When an individual progeny polypeptide isidentified by screening to display a favorable change in property (whencompared to the parental polypeptide), it can be sequenced to identifythe correspondingly favorable amino acid substitution contained therein.

In one aspect, upon mutagenizing each and every amino acid position in aparental polypeptide using saturation mutagenesis as disclosed herein, afavorable amino acid changes is identified at more than one amino acidposition. One or more new progeny molecules can be generated thatcontain a combination of all or part of these favorable amino acidsubstitutions. For example, if 2 specific favorable amino acid changesare identified in each of 3 amino acid positions in a polypeptide, thepermutations include 3 possibilities at each position (no change fromthe original amino acid and each of two favorable changes) and 3positions. Thus, there are 3×3×3 or 27 total possibilities, including 7that were previously examined—6 single point mutations (i.e., 2 at eachof three positions) and no change at any position.

The invention provides for the use of saturation mutagenesis incombination with additional mutagenization processes, such as processwhere two or more related polynucleotides are introduced into a suitablehost cell such that a hybrid polynucleotide is generated byrecombination and reductive reassortment.

In addition to performing mutagenesis along the entire sequence of agene, the instant invention provides that mutagenesis can be use toreplace each of any number of bases in a polynucleotide sequence,wherein the number of bases to be mutagenized is in one aspect everyinteger from 15 to 100,000. Thus, instead of mutagenizing every positionalong a molecule, one can subject every or a discrete number of bases(in one aspect a subset totaling from 15 to 100,000) to mutagenesis. Inone aspect, a separate nucleotide is used for mutagenizing each positionor group of positions along a polynucleotide sequence. A group of 3positions to be mutagenized may be a codon. The mutations can beintroduced using a mutagenic primer, containing a heterologous cassette,also referred to as a mutagenic cassette. Exemplary cassettes can havefrom 1 to 500 bases. Each nucleotide position in such heterologouscassettes be N, A, C, G, T, A/C, A/G, A/T, C/G, C/T, G/T, C/G/T, A/G/T,A/C/T, A/C/G, or E, where E is any base that is not A, C, G, or T (E canbe referred to as a designer oligo).

In one aspect, saturation mutagenesis is comprised of mutagenizing acomplete set of mutagenic cassettes (wherein each cassette is in oneaspect about 1-500 bases in length) in defined polynucleotide sequenceto be mutagenized (wherein the sequence to be mutagenized is in oneaspect from about 15 to 100,000 bases in length). Thus, a group ofmutations (ranging from 1 to 100 mutations) is introduced into eachcassette to be mutagenized. A grouping of mutations to be introducedinto one cassette can be different or the same from a second grouping ofmutations to be introduced into a second cassette during the applicationof one round of saturation mutagenesis. Such groupings are exemplifiedby deletions, additions, groupings of particular codons and groupings ofparticular nucleotide cassettes.

In one aspect, defined sequences to be mutagenized include a whole gene,pathway, cDNA, an entire open reading frame (ORF) and entire promoter,enhancer, repressor/transactivator, origin of replication, intron,operator, or any polynucleotide functional group. Generally, a “definedsequences” for this purpose may be any polynucleotide that a 15base-polynucleotide sequence and polynucleotide sequences of lengthsbetween 15 bases and 15,000 bases (this invention specifically namesevery integer in between). Considerations in choosing groupings ofcodons include types of amino acids encoded by a degenerate mutageniccassette.

In one aspect, a grouping of mutations that can be introduced into amutagenic cassette, this invention specifically provides for degeneratecodon substitutions (using degenerate oligos) that code for 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 amino acids ateach position and a library of polypeptides encoded thereby.

Synthetic Ligation Reassembly (SLR)

The invention provides a non-stochastic gene modification system termed“synthetic ligation reassembly,” or simply “SLR,” a “directed evolutionprocess,” to generate polypeptides, e.g., cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes orantibodies of the invention, with new or altered properties.

SLR is a method of ligating oligonucleotide fragments togethernon-stochastically. This method differs from stochastic oligonucleotideshuffling in that the nucleic acid building blocks are not shuffled,concatenated or chimerized randomly, but rather are assemblednon-stochastically. See, e.g., U.S. Pat. Nos. 6,773,900; 6,740,506;6,713,282; 6,635,449; 6,605,449; 6,537,776. In one aspect, SLR comprisesthe following steps: (a) providing a template polynucleotide, whereinthe template polynucleotide comprises sequence encoding a homologousgene; (b) providing a plurality of building block polynucleotides,wherein the building block polynucleotides are designed to crossoverreassemble with the template polynucleotide at a predetermined sequence,and a building block polynucleotide comprises a sequence that is avariant of the homologous gene and a sequence homologous to the templatepolynucleotide flanking the variant sequence; (c) combining a buildingblock polynucleotide with a template polynucleotide such that thebuilding block polynucleotide cross-over reassembles with the templatepolynucleotide to generate polynucleotides comprising homologous genesequence variations.

SLR does not depend on the presence of high levels of homology betweenpolynucleotides to be rearranged. Thus, this method can be used tonon-stochastically generate libraries (or sets) of progeny moleculescomprised of over 10¹⁰⁰ different chimeras. SLR can be used to generatelibraries comprised of over 10¹⁰⁰⁰ different progeny chimeras. Thus,aspects of the present invention include non-stochastic methods ofproducing a set of finalized chimeric nucleic acid molecule shaving anoverall assembly order that is chosen by design. This method includesthe steps of generating by design a plurality of specific nucleic acidbuilding blocks having serviceable mutually compatible ligatable ends,and assembling these nucleic acid building blocks, such that a designedoverall assembly order is achieved.

The mutually compatible ligatable ends of the nucleic acid buildingblocks to be assembled are considered to be “serviceable” for this typeof ordered assembly if they enable the building blocks to be coupled inpredetermined orders. Thus, the overall assembly order in which thenucleic acid building blocks can be coupled is specified by the designof the ligatable ends. If more than one assembly step is to be used,then the overall assembly order in which the nucleic acid buildingblocks can be coupled is also specified by the sequential order of theassembly step(s). In one aspect, the annealed building pieces aretreated with an enzyme, such as a ligase (e.g., T4 DNA ligase), toachieve covalent bonding of the building pieces.

In one aspect, the design of the oligonucleotide building blocks isobtained by analyzing a set of progenitor nucleic acid sequencetemplates that serve as a basis for producing a progeny set of finalizedchimeric polynucleotides. These parental oligonucleotide templates thusserve as a source of sequence information that aids in the design of thenucleic acid building blocks that are to be mutagenized, e.g.,chimerized or shuffled. In one aspect of this method, the sequences of aplurality of parental nucleic acid templates are aligned in order toselect one or more demarcation points. The demarcation points can belocated at an area of homology, and are comprised of one or morenucleotides. These demarcation points are in one aspect shared by atleast two of the progenitor templates. The demarcation points canthereby be used to delineate the boundaries of oligonucleotide buildingblocks to be generated in order to rearrange the parentalpolynucleotides. The demarcation points identified and selected in theprogenitor molecules serve as potential chimerization points in theassembly of the final chimeric progeny molecules. A demarcation pointcan be an area of homology (comprised of at least one homologousnucleotide base) shared by at least two parental polynucleotidesequences. Alternatively, a demarcation point can be an area of homologythat is shared by at least half of the parental polynucleotidesequences, or, it can be an area of homology that is shared by at leasttwo thirds of the parental polynucleotide sequences. Even more in oneaspect a serviceable demarcation points is an area of homology that isshared by at least three fourths of the parental polynucleotidesequences, or, it can be shared by at almost all of the parentalpolynucleotide sequences. In one aspect, a demarcation point is an areaof homology that is shared by all of the parental polynucleotidesequences.

In one aspect, a ligation reassembly process is performed exhaustivelyin order to generate an exhaustive library of progeny chimericpolynucleotides. In other words, all possible ordered combinations ofthe nucleic acid building blocks are represented in the set of finalizedchimeric nucleic acid molecules. At the same time, in another aspect,the assembly order (i.e., the order of assembly of each building blockin the 5′ to 3 sequence of each finalized chimeric nucleic acid) in eachcombination is by design (or non-stochastic) as described above. Becauseof the non-stochastic nature of this invention, the possibility ofunwanted side products is greatly reduced.

In another aspect, the ligation reassembly method is performedsystematically. For example, the method is performed in order togenerate a systematically compartmentalized library of progenymolecules, with compartments that can be screened systematically, e.g.,one by one. In other words this invention provides that, through theselective and judicious use of specific nucleic acid building blocks,coupled with the selective and judicious use of sequentially steppedassembly reactions, a design can be achieved where specific sets ofprogeny products are made in each of several reaction vessels. Thisallows a systematic examination and screening procedure to be performed.Thus, these methods allow a potentially very large number of progenymolecules to be examined systematically in smaller groups. Because ofits ability to perform chimerizations in a manner that is highlyflexible yet exhaustive and systematic as well, particularly when thereis a low level of homology among the progenitor molecules, these methodsprovide for the generation of a library (or set) comprised of a largenumber of progeny molecules. Because of the non-stochastic nature of theinstant ligation reassembly invention, the progeny molecules generatedin one aspect comprise a library of finalized chimeric nucleic acidmolecules having an overall assembly order that is chosen by design. Thesaturation mutagenesis and optimized directed evolution methods also canbe used to generate different progeny molecular species. It isappreciated that the invention provides freedom of choice and controlregarding the selection of demarcation points, the size and number ofthe nucleic acid building blocks, and the size and design of thecouplings. It is appreciated, furthermore, that the requirement forintermolecular homology is highly relaxed for the operability of thisinvention. In fact, demarcation points can even be chosen in areas oflittle or no intermolecular homology. For example, because of codonwobble, i.e., the degeneracy of codons, nucleotide substitutions can beintroduced into nucleic acid building blocks without altering the aminoacid originally encoded in the corresponding progenitor template.Alternatively, a codon can be altered such that the coding for anoriginally amino acid is altered. This invention provides that suchsubstitutions can be introduced into the nucleic acid building block inorder to increase the incidence of intermolecular homologous demarcationpoints and thus to allow an increased number of couplings to be achievedamong the building blocks, which in turn allows a greater number ofprogeny chimeric molecules to be generated.

Synthetic Gene Reassembly

In one aspect, the present invention provides a non-stochastic methodtermed synthetic gene reassembly, that is somewhat related to stochasticshuffling, save that the nucleic acid building blocks are not shuffledor concatenated or chimerized randomly, but rather are assemblednon-stochastically. See, e.g., U.S. Pat. No. 6,537,776.

The synthetic gene reassembly method does not depend on the presence ofa high level of homology between polynucleotides to be shuffled. Theinvention can be used to non-stochastically generate libraries (or sets)of progeny molecules comprised of over 10¹⁰⁰ different chimeras.Conceivably, synthetic gene reassembly can even be used to generatelibraries comprised of over 10¹⁰⁰⁰ different progeny chimeras.

Thus, in one aspect, the invention provides a non-stochastic method ofproducing a set of finalized chimeric nucleic acid molecules having anoverall assembly order that is chosen by design, which method iscomprised of the steps of generating by design a plurality of specificnucleic acid building blocks having serviceable mutually compatibleligatable ends and assembling these nucleic acid building blocks, suchthat a designed overall assembly order is achieved.

The mutually compatible ligatable ends of the nucleic acid buildingblocks to be assembled are considered to be “serviceable” for this typeof ordered assembly if they enable the building blocks to be coupled inpredetermined orders. Thus, in one aspect, the overall assembly order inwhich the nucleic acid building blocks can be coupled is specified bythe design of the ligatable ends and, if more than one assembly step isto be used, then the overall assembly order in which the nucleic acidbuilding blocks can be coupled is also specified by the sequential orderof the assembly step(s). In a one aspect of the invention, the annealedbuilding pieces are treated with an enzyme, such as a ligase (e.g., T4DNA ligase) to achieve covalent bonding of the building pieces.

In a another aspect, the design of nucleic acid building blocks isobtained upon analysis of the sequences of a set of progenitor nucleicacid templates that serve as a basis for producing a progeny set offinalized chimeric nucleic acid molecules. These progenitor nucleic acidtemplates thus serve as a source of sequence information that aids inthe design of the nucleic acid building blocks that are to bemutagenized, i.e., chimerized or shuffled.

In one exemplification, the invention provides for the chimerization ofa family of related genes and their encoded family of related products.In a particular exemplification, the encoded products are enzymes. Thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes of the present invention can be mutagenized inaccordance with the methods described herein.

Thus according to one aspect of the invention, the sequences of aplurality of progenitor nucleic acid templates (e.g., polynucleotides ofthe invention) are aligned in order to select one or more demarcationpoints, which demarcation points can be located at an area of homology.The demarcation points can be used to delineate the boundaries ofnucleic acid building blocks to be generated. Thus, the demarcationpoints identified and selected in the progenitor molecules serve aspotential chimerization points in the assembly of the progeny molecules.

In one aspect, a serviceable demarcation point is an area of homology(comprised of at least one homologous nucleotide base) shared by atleast two progenitor templates, but the demarcation point can be an areaof homology that is shared by at least half of the progenitor templates,at least two thirds of the progenitor templates, at least three fourthsof the progenitor templates and in one aspect at almost all of theprogenitor templates. Even more in one aspect still a serviceabledemarcation point is an area of homology that is shared by all of theprogenitor templates.

In a one aspect, the gene reassembly process is performed exhaustivelyin order to generate an exhaustive library. In other words, all possibleordered combinations of the nucleic acid building blocks are representedin the set of finalized chimeric nucleic acid molecules. At the sametime, the assembly order (i.e., the order of assembly of each buildingblock in the 5′ to 3 sequence of each finalized chimeric nucleic acid)in each combination is by design (or non-stochastic). Because of thenon-stochastic nature of the method, the possibility of unwanted sideproducts is greatly reduced.

In another aspect, the method provides that the gene reassembly processis performed systematically, for example to generate a systematicallycompartmentalized library, with compartments that can be screenedsystematically, e.g., one by one. In other words the invention providesthat, through the selective and judicious use of specific nucleic acidbuilding blocks, coupled with the selective and judicious use ofsequentially stepped assembly reactions, an experimental design can beachieved where specific sets of progeny products are made in each ofseveral reaction vessels. This allows a systematic examination andscreening procedure to be performed. Thus, it allows a potentially verylarge number of progeny molecules to be examined systematically insmaller groups.

Because of its ability to perform chimerizations in a manner that ishighly flexible yet exhaustive and systematic as well, particularly whenthere is a low level of homology among the progenitor molecules, theinstant invention provides for the generation of a library (or set)comprised of a large number of progeny molecules. Because of thenon-stochastic nature of the instant gene reassembly invention, theprogeny molecules generated in one aspect comprise a library offinalized chimeric nucleic acid molecules having an overall assemblyorder that is chosen by design. In a particularly aspect, such agenerated library is comprised of greater than 10³ to greater than10¹⁰⁰⁰ different progeny molecular species.

In one aspect, a set of finalized chimeric nucleic acid molecules,produced as described is comprised of a polynucleotide encoding apolypeptide. According to one aspect, this polynucleotide is a gene,which may be a man-made gene. According to another aspect, thispolynucleotide is a gene pathway, which may be a man-made gene pathway.The invention provides that one or more man-made genes generated by theinvention may be incorporated into a man-made gene pathway, such aspathway operable in a eukaryotic organism (including a plant).

In another exemplification, the synthetic nature of the step in whichthe building blocks are generated allows the design and introduction ofnucleotides (e.g., one or more nucleotides, which may be, for example,codons or introns or regulatory sequences) that can later be optionallyremoved in an in vitro process (e.g., by mutagenesis) or in an in vivoprocess (e.g., by utilizing the gene splicing ability of a hostorganism). It is appreciated that in many instances the introduction ofthese nucleotides may also be desirable for many other reasons inaddition to the potential benefit of creating a serviceable demarcationpoint.

Thus, according to another aspect, the invention provides that a nucleicacid building block can be used to introduce an intron. Thus, theinvention provides that functional introns may be introduced into aman-made gene of the invention. The invention also provides thatfunctional introns may be introduced into a man-made gene pathway of theinvention. Accordingly, the invention provides for the generation of achimeric polynucleotide that is a man-made gene containing one (or more)artificially introduced intron(s).

The invention also provides for the generation of a chimericpolynucleotide that is a man-made gene pathway containing one (or more)artificially introduced intron(s). In one aspect, the artificiallyintroduced intron(s) are functional in one or more host cells for genesplicing much in the way that naturally-occurring introns servefunctionally in gene splicing. The invention provides a process ofproducing man-made intron-containing polynucleotides to be introducedinto host organisms for recombination and/or splicing.

A man-made gene produced using the invention can also serve as asubstrate for recombination with another nucleic acid. Likewise, aman-made gene pathway produced using the invention can also serve as asubstrate for recombination with another nucleic acid. In one aspect,the recombination is facilitated by, or occurs at, areas of homologybetween the man-made, intron-containing gene and a nucleic acid, whichserves as a recombination partner. In one aspect, the recombinationpartner may also be a nucleic acid generated by the invention, includinga man-made gene or a man-made gene pathway. Recombination may befacilitated by or may occur at areas of homology that exist at the one(or more) artificially introduced intron(s) in the man-made gene.

In one aspect, the synthetic gene reassembly method of the inventionutilizes a plurality of nucleic acid building blocks, each of which inone aspect has two ligatable ends. The two ligatable ends on eachnucleic acid building block may be two blunt ends (i.e., each having anoverhang of zero nucleotides), or in one aspect one blunt end and oneoverhang, or more in one aspect still two overhangs. In one aspect, auseful overhang for this purpose may be a 3′ overhang or a 5′ overhang.Thus, a nucleic acid building block may have a 3′ overhang oralternatively a 5′ overhang or alternatively two 3′ overhangs oralternatively two 5′ overhangs. The overall order in which the nucleicacid building blocks are assembled to form a finalized chimeric nucleicacid molecule is determined by purposeful experimental design and is notrandom.

In one aspect, a nucleic acid building block is generated by chemicalsynthesis of two single-stranded nucleic acids (also referred to assingle-stranded oligos) and contacting them so as to allow them toanneal to form a double-stranded nucleic acid building block. Adouble-stranded nucleic acid building block can be of variable size. Thesizes of these building blocks can be small or large. Exemplary sizesfor building block range from 1 base pair (not including any overhangs)to 100,000 base pairs (not including any overhangs). Other exemplarysize ranges are also provided, which have lower limits of from 1 bp to10,000 bp (including every integer value in between) and upper limits offrom 2 bp to 100,000 bp (including every integer value in between).

Many methods exist by which a double-stranded nucleic acid buildingblock can be generated that is serviceable for the invention; and theseare known in the art and can be readily performed by the skilledartisan. According to one aspect, a double-stranded nucleic acidbuilding block is generated by first generating two single strandednucleic acids and allowing them to anneal to form a double-strandednucleic acid building block. The two strands of a double-strandednucleic acid building block may be complementary at every nucleotideapart from any that form an overhang; thus containing no mismatches,apart from any overhang(s). According to another aspect, the two strandsof a double-stranded nucleic acid building block are complementary atfewer than every nucleotide apart from any that form an overhang. Thus,according to this aspect, a double-stranded nucleic acid building blockcan be used to introduce codon degeneracy. In one aspect the codondegeneracy is introduced using the site-saturation mutagenesis describedherein, using one or more N,N,G/T cassettes or alternatively using oneor more N,N,N cassettes.

The in vivo recombination method of the invention can be performedblindly on a pool of unknown hybrids or alleles of a specificpolynucleotide or sequence. However, it is not necessary to know theactual DNA or RNA sequence of the specific polynucleotide. The approachof using recombination within a mixed population of genes can be usefulfor the generation of any useful proteins, for example, a cellulase ofthe invention or a variant thereof. This approach may be used togenerate proteins having altered specificity or activity. The approachmay also be useful for the generation of hybrid nucleic acid sequences,for example, promoter regions, introns, exons, enhancer sequences, 31untranslated regions or 51 untranslated regions of genes. Thus thisapproach may be used to generate genes having increased rates ofexpression. This approach may also be useful in the study of repetitiveDNA sequences. Finally, this approach may be useful to make ribozymes oraptamers of the invention.

In one aspect the invention described herein is directed to the use ofrepeated cycles of reductive reassortment, recombination and selectionwhich allow for the directed molecular evolution of highly complexlinear sequences, such as DNA, RNA or proteins thorough recombination.

Optimized Directed Evolution System

The invention provides a non-stochastic gene modification system termed“optimized directed evolution system” to generate polypeptides, e.g.,cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes or antibodies of the invention, with new or alteredproperties. In one aspect, optimized directed evolution is directed tothe use of repeated cycles of reductive reassortment, recombination andselection that allow for the directed molecular evolution of nucleicacids through recombination.

Optimized directed evolution allows generation of a large population ofevolved chimeric sequences, wherein the generated population issignificantly enriched for sequences that have a predetermined number ofcrossover events. A crossover event is a point in a chimeric sequencewhere a shift in sequence occurs from one parental variant to anotherparental variant. Such a point is normally at the juncture of whereoligonucleotides from two parents are ligated together to form a singlesequence. This method allows calculation of the correct concentrationsof oligonucleotide sequences so that the final chimeric population ofsequences is enriched for the chosen number of crossover events. Thisprovides more control over choosing chimeric variants having apredetermined number of crossover events.

In addition, this method provides a convenient means for exploring atremendous amount of the possible protein variant space in comparison toother systems. Previously, if one generated, for example, 10¹³ chimericmolecules during a reaction, it would be extremely difficult to testsuch a high number of chimeric variants for a particular activity.Moreover, a significant portion of the progeny population would have avery high number of crossover events which resulted in proteins thatwere less likely to have increased levels of a particular activity. Byusing these methods, the population of chimerics molecules can beenriched for those variants that have a particular number of crossoverevents. Thus, although one can still generate 10¹³ chimeric moleculesduring a reaction, each of the molecules chosen for further analysismost likely has, for example, only three crossover events. Because theresulting progeny population can be skewed to have a predeterminednumber of crossover events, the boundaries on the functional varietybetween the chimeric molecules is reduced. This provides a moremanageable number of variables when calculating which oligonucleotidefrom the original parental polynucleotides might be responsible foraffecting a particular trait.

One method for creating a chimeric progeny polynucleotide sequence is tocreate oligonucleotides corresponding to fragments or portions of eachparental sequence. Each oligonucleotide in one aspect includes a uniqueregion of overlap so that mixing the oligonucleotides together resultsin a new variant that has each oligonucleotide fragment assembled in thecorrect order. Alternatively protocols for practicing these methods ofthe invention can be found in U.S. Pat. Nos. 6,773,900; 6,740,506;6,713,282; 6,635,449; 6,605,449; 6,537,776; 6,361,974.

The number of oligonucleotides generated for each parental variant bearsa relationship to the total number of resulting crossovers in thechimeric molecule that is ultimately created. For example, threeparental nucleotide sequence variants might be provided to undergo aligation reaction in order to find a chimeric variant having, forexample, greater activity at high temperature. As one example, a set of50 oligonucleotide sequences can be generated corresponding to eachportions of each parental variant. Accordingly, during the ligationreassembly process there could be up to 50 crossover events within eachof the chimeric sequences. The probability that each of the generatedchimeric polynucleotides will contain oligonucleotides from eachparental variant in alternating order is very low. If eacholigonucleotide fragment is present in the ligation reaction in the samemolar quantity it is likely that in some positions oligonucleotides fromthe same parental polynucleotide will ligate next to one another andthus not result in a crossover event. If the concentration of eacholigonucleotide from each parent is kept constant during any ligationstep in this example, there is a ⅓ chance (assuming 3 parents) that anoligonucleotide from the same parental variant will ligate within thechimeric sequence and produce no crossover.

Accordingly, a probability density function (PDF) can be determined topredict the population of crossover events that are likely to occurduring each step in a ligation reaction given a set number of parentalvariants, a number of oligonucleotides corresponding to each variant,and the concentrations of each variant during each step in the ligationreaction. The statistics and mathematics behind determining the PDF isdescribed below. By utilizing these methods, one can calculate such aprobability density function, and thus enrich the chimeric progenypopulation for a predetermined number of crossover events resulting froma particular ligation reaction. Moreover, a target number of crossoverevents can be predetermined, and the system then programmed to calculatethe starting quantities of each parental oligonucleotide during eachstep in the ligation reaction to result in a probability densityfunction that centers on the predetermined number of crossover events.These methods are directed to the use of repeated cycles of reductivereassortment, recombination and selection that allow for the directedmolecular evolution of a nucleic acid encoding a polypeptide throughrecombination. This system allows generation of a large population ofevolved chimeric sequences, wherein the generated population issignificantly enriched for sequences that have a predetermined number ofcrossover events. A crossover event is a point in a chimeric sequencewhere a shift in sequence occurs from one parental variant to anotherparental variant. Such a point is normally at the juncture of whereoligonucleotides from two parents are ligated together to form a singlesequence. The method allows calculation of the correct concentrations ofoligonucleotide sequences so that the final chimeric population ofsequences is enriched for the chosen number of crossover events. Thisprovides more control over choosing chimeric variants having apredetermined number of crossover events.

In addition, these methods provide a convenient means for exploring atremendous amount of the possible protein variant space in comparison toother systems. By using the methods described herein, the population ofchimerics molecules can be enriched for those variants that have aparticular number of crossover events. Thus, although one can stillgenerate 10¹³ chimeric molecules during a reaction, each of themolecules chosen for further analysis most likely has, for example, onlythree crossover events. Because the resulting progeny population can beskewed to have a predetermined number of crossover events, theboundaries on the functional variety between the chimeric molecules isreduced. This provides a more manageable number of variables whencalculating which oligonucleotide from the original parentalpolynucleotides might be responsible for affecting a particular trait.

In one aspect, the method creates a chimeric progeny polynucleotidesequence by creating oligonucleotides corresponding to fragments orportions of each parental sequence. Each oligonucleotide in one aspectincludes a unique region of overlap so that mixing the oligonucleotidestogether results in a new variant that has each oligonucleotide fragmentassembled in the correct order. See also U.S. Pat. Nos. 6,773,900;6,740,506; 6,713,282; 6,635,449; 6,605,449; 6,537,776; 6,361,974.

Determining Crossover Events

Aspects of the invention include a system and software that receive adesired crossover probability density function (PDF), the number ofparent genes to be reassembled, and the number of fragments in thereassembly as inputs. The output of this program is a “fragment PDF”that can be used to determine a recipe for producing reassembled genes,and the estimated crossover PDF of those genes. The processing describedherein is in one aspect performed in MATLAB™ (The Mathworks, Natick,Mass.) a programming language and development environment for technicalcomputing.

Iterative Processes

Any process of the invention can be iteratively repeated, e.g., anucleic acid encoding an altered or new cellulase phenotype, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention, can be identified, re-isolated, again modified, re-tested foractivity. This process can be iteratively repeated until a desiredphenotype is engineered. For example, an entire biochemical anabolic orcatabolic pathway can be engineered into a cell, including, e.g.,cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity.

Similarly, if it is determined that a particular oligonucleotide has noaffect at all on the desired trait (e.g., a new cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme phenotype),it can be removed as a variable by synthesizing larger parentaloligonucleotides that include the sequence to be removed. Sinceincorporating the sequence within a larger sequence prevents anycrossover events, there will no longer be any variation of this sequencein the progeny polynucleotides. This iterative practice of determiningwhich oligonucleotides are most related to the desired trait, and whichare unrelated, allows more efficient exploration all of the possibleprotein variants that might be provide a particular trait or activity.

In Vivo Shuffling

In various aspects, in vivo shuffling of molecules is used in methods ofthe invention to provide variants of polypeptides of the invention,e.g., antibodies of the invention or cellulases of the invention, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes, and thelike. In vivo shuffling can be performed utilizing the natural propertyof cells to recombine multimers. While recombination in vivo hasprovided the major natural route to molecular diversity, geneticrecombination remains a relatively complex process that involves 1) therecognition of homologies; 2) strand cleavage, strand invasion, andmetabolic steps leading to the production of recombinant chiasma; andfinally 3) the resolution of chiasma into discrete recombined molecules.The formation of the chiasma requires the recognition of homologoussequences.

In another aspect, the invention includes a method for producing ahybrid polynucleotide from at least a first polynucleotide and a secondpolynucleotide. The invention can be used to produce a hybridpolynucleotide by introducing at least a first polynucleotide and asecond polynucleotide (e.g., one, or both, being an exemplary cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomeraseenzyme-encoding sequence of the invention) which share at least oneregion of partial sequence homology into a suitable host cell. Theregions of partial sequence homology promote processes which result insequence reorganization producing a hybrid polynucleotide. The term“hybrid polynucleotide”, as used herein, is any nucleotide sequencewhich results from the method of the present invention and containssequence from at least two original polynucleotide sequences. Suchhybrid polynucleotides can result from intermolecular recombinationevents which promote sequence integration between DNA molecules. Inaddition, such hybrid polynucleotides can result from intramolecularreductive reassortment processes which utilize repeated sequences toalter a nucleotide sequence within a DNA molecule.

In one aspect, vivo reassortment focuses on “inter-molecular” processescollectively referred to as “recombination”; which in bacteria, isgenerally viewed as a “RecA-dependent” phenomenon. The invention canrely on recombination processes of a host cell to recombine andre-assort sequences, or the cells' ability to mediate reductiveprocesses to decrease the complexity of quasi-repeated sequences in thecell by deletion. This process of “reductive reassortment” occurs by an“intra-molecular”, RecA-independent process.

In another aspect of the invention, novel polynucleotides can begenerated by the process of reductive reassortment. The method involvesthe generation of constructs containing consecutive sequences (originalencoding sequences), their insertion into an appropriate vector andtheir subsequent introduction into an appropriate host cell. Thereassortment of the individual molecular identities occurs bycombinatorial processes between the consecutive sequences in theconstruct possessing regions of homology, or between quasi-repeatedunits. The reassortment process recombines and/or reduces the complexityand extent of the repeated sequences and results in the production ofnovel molecular species. Various treatments may be applied to enhancethe rate of reassortment. These could include treatment withultra-violet light, or DNA damaging chemicals and/or the use of hostcell lines displaying enhanced levels of “genetic instability”. Thus thereassortment process may involve homologous recombination or the naturalproperty of quasi-repeated sequences to direct their own evolution.

Repeated or “quasi-repeated” sequences play a role in geneticinstability. In one aspect, “quasi-repeats” are repeats that are notrestricted to their original unit structure. Quasi-repeated units can bepresented as an array of sequences in a construct; consecutive units ofsimilar sequences. Once ligated, the junctions between the consecutivesequences become essentially invisible and the quasi-repetitive natureof the resulting construct is now continuous at the molecular level. Thedeletion process the cell performs to reduce the complexity of theresulting construct operates between the quasi-repeated sequences. Thequasi-repeated units provide a practically limitless repertoire oftemplates upon which slippage events can occur. In one aspect, theconstructs containing the quasi-repeats thus effectively providesufficient molecular elasticity that deletion (and potentiallyinsertion) events can occur virtually anywhere within thequasi-repetitive units.

When the quasi-repeated sequences are all ligated in the sameorientation, for instance head to tail or vice versa, the cell cannotdistinguish individual units. Consequently, the reductive process canoccur throughout the sequences. In contrast, when for example, the unitsare presented head to head, rather than head to tail, the inversiondelineates the endpoints of the adjacent unit so that deletion formationwill favor the loss of discrete units. Thus, it is preferable with thepresent method that the sequences are in the same orientation. Randomorientation of quasi-repeated sequences will result in the loss ofreassortment efficiency, while consistent orientation of the sequenceswill offer the highest efficiency. However, while having fewer of thecontiguous sequences in the same orientation decreases the efficiency,it may still provide sufficient elasticity for the effective recovery ofnovel molecules. Constructs can be made with the quasi-repeatedsequences in the same orientation to allow higher efficiency.

Sequences can be assembled in a head to tail orientation using any of avariety of methods, including the following:

-   -   a) Primers that include a poly-A head and poly-T tail which when        made single-stranded would provide orientation can be utilized.        This is accomplished by having the first few bases of the        primers made from RNA and hence easily removed RNaseH.    -   b) Primers that include unique restriction cleavage sites can be        utilized. Multiple sites, a battery of unique sequences and        repeated synthesis and ligation steps would be required.    -   c) The inner few bases of the primer could be thiolated and an        exonuclease used to produce properly tailed molecules.

In one aspect, the recovery of the re-assorted sequences relies on theidentification of cloning vectors with a reduced repetitive index (RI).The re-assorted encoding sequences can then be recovered byamplification. The products are re-cloned and expressed. The recovery ofcloning vectors with reduced RI can be affected by:

-   -   1) The use of vectors only stably maintained when the construct        is reduced in complexity.    -   2) The physical recovery of shortened vectors by physical        procedures. In this case, the cloning vector would be recovered        using standard plasmid isolation procedures and size        fractionated on either an agarose gel, or column with a low        molecular weight cut off utilizing standard procedures.    -   3) The recovery of vectors containing interrupted genes which        can be selected when insert size decreases.    -   4) The use of direct selection techniques with an expression        vector and the appropriate selection.

Encoding sequences (for example, genes) from related organisms maydemonstrate a high degree of homology and encode quite diverse proteinproducts. These types of sequences are particularly useful in thepresent invention as quasi-repeats. However, while the examplesillustrated below demonstrate the reassortment of nearly identicaloriginal encoding sequences (quasi-repeats); this process is not limitedto such nearly identical repeats.

The following example demonstrates an exemplary method of the invention.Encoding nucleic acid sequences (quasi-repeats) derived from three (3)unique species are described. Each sequence encodes a protein with adistinct set of properties. Each of the sequences differs by a single ora few base pairs at a unique position in the sequence. Thequasi-repeated sequences are separately or collectively amplified andligated into random assemblies such that all possible permutations andcombinations are available in the population of ligated molecules. Thenumber of quasi-repeat units can be controlled by the assemblyconditions. The average number of quasi-repeated units in a construct isdefined as the repetitive index (RI).

Once formed, the constructs may, or may not be size fractionated on anagarose gel according to published protocols, inserted into a cloningvector and transfected into an appropriate host cell. The cells are thenpropagated and “reductive reassortment” is effected. The rate of thereductive reassortment process may be stimulated by the introduction ofDNA damage if desired. Whether the reduction in RI is mediated bydeletion formation between repeated sequences by an “intra-molecular”mechanism, or mediated by recombination-like events through“inter-molecular” mechanisms is immaterial. The end result is areassortment of the molecules into all possible combinations.

Optionally, the method comprises the additional step of screening thelibrary members of the shuffled pool to identify individual shuffledlibrary members having the ability to bind or otherwise interact, orcatalyze a particular reaction (e.g., such as catalytic domain of anenzyme) with a predetermined macromolecule, such as for example aproteinaceous receptor, an oligosaccharide, virion, or otherpredetermined compound or structure.

The polypeptides that are identified from such libraries can be used fortherapeutic, diagnostic, research and related purposes (e.g., catalysts,solutes for increasing osmolarity of an aqueous solution and the like)and/or can be subjected to one or more additional cycles of shufflingand/or selection.

In another aspect, it is envisioned that prior to or duringrecombination or reassortment, polynucleotides generated by the methodof the invention can be subjected to agents or processes which promotethe introduction of mutations into the original polynucleotides. Theintroduction of such mutations would increase the diversity of resultinghybrid polynucleotides and polypeptides encoded therefrom. The agents orprocesses which promote mutagenesis can include, but are not limited to:(+)-CC-1065, or a synthetic analog such as (+)-CC-1065-(N3-Adenine (SeeSun and Hurley, (1992); an N-acetylated or deacetylated4′-fluoro-4-aminobiphenyl adduct capable of inhibiting DNA synthesis(See, for example, van de Poll et al. (1992)); or a N-acetylated ordeacetylated 4-aminobiphenyl adduct capable of inhibiting DNA synthesis(See also, van de Poll et al. (1992), pp. 751-758); trivalent chromium,a trivalent chromium salt, a polycyclic aromatic hydrocarbon (PAH) DNAadduct capable of inhibiting DNA replication, such as7-bromomethyl-benz[a]anthracene (“BMA”),tris(2,3-dibromopropyl)phosphate (“Tris-BP”),1,2-dibromo-3-chloropropane (“DBCP”), 2-bromoacrolein (2BA),benzo[a]pyrene-7,8-dihydrodiol-9-10-epoxide (“BPDE”), a platinum(II)halogen salt, N-hydroxy-2-amino-3-methylimidazo[4,5-f]-quinoline(“N-hydroxy-IQ”) andN-hydroxy-2-amino-1-methyl-6-phenylimidazo[4,5-f]-pyridine(“N-hydroxy-PhIP”). Exemplary means for slowing or halting PCRamplification consist of UV light (+)-CC-1065 and(+)-CC-1065-(N3-Adenine). Particularly encompassed means are DNA adductsor polynucleotides comprising the DNA adducts from the polynucleotidesor polynucleotides pool, which can be released or removed by a processincluding heating the solution comprising the polynucleotides prior tofurther processing.

In another aspect the invention is directed to a method of producingrecombinant proteins having biological activity by treating a samplecomprising double-stranded template polynucleotides encoding a wild-typeprotein under conditions according to the invention which provide forthe production of hybrid or re-assorted polynucleotides.

Producing Sequence Variants

The invention also provides additional methods for making sequencevariants of the nucleic acid (e.g., cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme) sequences of theinvention. The invention also provides additional methods for isolatingcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes using the nucleic acids and polypeptides of theinvention. In one aspect, the invention provides for variants of acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme coding sequence (e.g., a gene, cDNA or message) ofthe invention, which can be altered by any means, including, e.g.,random or stochastic methods, or, non-stochastic, or “directedevolution,” methods, as described above.

The isolated variants may be naturally occurring. Variant can also becreated in vitro. Variants may be created using genetic engineeringtechniques such as site directed mutagenesis, random chemicalmutagenesis, Exonuclease III deletion procedures, and standard cloningtechniques. Alternatively, such variants, fragments, analogs, orderivatives may be created using chemical synthesis or modificationprocedures. Other methods of making variants are also familiar to thoseskilled in the art. These include procedures in which nucleic acidsequences obtained from natural isolates are modified to generatenucleic acids which encode polypeptides having characteristics whichenhance their value in industrial or laboratory applications. In suchprocedures, a large number of variant sequences having one or morenucleotide differences with respect to the sequence obtained from thenatural isolate are generated and characterized. These nucleotidedifferences can result in amino acid changes with respect to thepolypeptides encoded by the nucleic acids from the natural isolates.

For example, variants may be created using error prone PCR. In oneaspect of error prone PCR, the PCR is performed under conditions wherethe copying fidelity Of the DNA polymerase is low, such that a high rateof point mutations is obtained along the entire length of the PCRproduct. Error prone PCR is described, e.g., in Leung (1989) Technique1:11-15) and Caldwell (1992) PCR Methods Applic. 2:28-33. Briefly, insuch procedures, nucleic acids to be mutagenized are mixed with PCRprimers, reaction buffer, MgCl₂, MnCl₂, Taq polymerase and anappropriate concentration of dNTPs for achieving a high rate of pointmutation along the entire length of the PCR product. For example, thereaction may be performed using 20 fmoles of nucleic acid to bemutagenized, 30 pmole of each PCR primer, a reaction buffer comprising50 mM KCl, 10 mM Tris HCl (pH 8.3) and 0.01% gelatin, 7 mM MgCl2, 0.5 mMMnCl₂, 5 units of Taq polymerase, 0.2 mM dGTP, 0.2 mM dATP, 1 mM dCTP,and 1 mM dTTP. PCR may be performed for 30 cycles of 94° C. for 1 min,45° C. for 1 min, and 72° C. for 1 min. However, it will be appreciatedthat these parameters may be varied as appropriate. The mutagenizednucleic acids are cloned into an appropriate vector and the activitiesof the polypeptides encoded by the mutagenized nucleic acids areevaluated.

In one aspect, variants are created using oligonucleotide directedmutagenesis to generate site-specific mutations in any cloned DNA ofinterest. Oligonucleotide mutagenesis is described, e.g., inReidhaar-Olson (1988) Science 241:53-57. Briefly, in such procedures aplurality of double stranded oligonucleotides bearing one or moremutations to be introduced into the cloned DNA are synthesized andinserted into the cloned DNA to be mutagenized. In one aspect, clonescontaining the mutagenized DNA are recovered, expressed, and theactivities of the polypeptide encoded therein assessed.

Another method for generating variants is assembly PCR. Assembly PCRinvolves the assembly of a PCR product from a mixture of small DNAfragments. A large number of different PCR reactions occur in parallelin the same vial, with the products of one reaction priming the productsof another reaction. Assembly PCR is described in, e.g., U.S. Pat. No.5,965,408.

In one aspect, sexual PCR mutagenesis is an exemplary method ofgenerating variants of the invention. In one aspect of sexual PCRmutagenesis forced homologous recombination occurs between DNA moleculesof different but highly related DNA sequence in vitro, as a result ofrandom fragmentation of the DNA molecule based on sequence homology,followed by fixation of the crossover by primer extension in a PCRreaction. Sexual PCR mutagenesis is described, e.g., in Stemmer (1994)Proc. Natl. Acad. Sci. USA 91:10747-10751. Briefly, in such procedures aplurality of nucleic acids to be recombined are digested with DNase togenerate fragments having an average size of 50-200 nucleotides.Fragments of the desired average size are purified and resuspended in aPCR mixture. PCR is conducted under conditions which facilitaterecombination between the nucleic acid fragments. For example, PCR maybe performed by resuspending the purified fragments at a concentrationof 10-30 ng/μl in a solution of 0.2 mM of each dNTP, 2.2 mM MgCl₂, 50 mMKCL, 10 mM Tris HCl, pH 9.0, and 0.1% Triton X-100. 2.5 units of Taqpolymerase per 100:1 of reaction mixture is added and PCR is performedusing the following regime: 94° C. for 60 seconds, 94° C. for 30seconds, 50-55° C. for 30 seconds, 72° C. for 30 seconds (30-45 times)and 72° C. for 5 minutes. However, it will be appreciated that theseparameters may be varied as appropriate. In some aspects,oligonucleotides may be included in the PCR reactions. In other aspects,the Klenow fragment of DNA polymerase I may be used in a first set ofPCR reactions and Taq polymerase may be used in a subsequent set of PCRreactions. Recombinant sequences are isolated and the activities of thepolypeptides they encode are assessed.

In one aspect, variants are created by in vivo mutagenesis. In someaspects, random mutations in a sequence of interest are generated bypropagating the sequence of interest in a bacterial strain, such as anE. coli strain, which carries mutations in one or more of the DNA repairpathways. Such “mutator” strains have a higher random mutation rate thanthat of a wild-type parent. Propagating the DNA in one of these strainswill eventually generate random mutations within the DNA. Mutatorstrains suitable for use for in vivo mutagenesis are described in PCTPublication No. WO 91/16427, published Oct. 31, 1991, entitled “Methodsfor Phenotype Creation from Multiple Gene Populations”.

Variants may also be generated using cassette mutagenesis. In cassettemutagenesis a small region of a double stranded DNA molecule is replacedwith a synthetic oligonucleotide “cassette” that differs from the nativesequence. The oligonucleotide often contains completely and/or partiallyrandomized native sequence.

Recursive ensemble mutagenesis may also be used to generate variants.Recursive ensemble mutagenesis is an algorithm for protein engineering(protein mutagenesis) developed to produce diverse populations ofphenotypically related mutants whose members differ in amino acidsequence. This method uses a feedback mechanism to control successiverounds of combinatorial cassette mutagenesis. Recursive ensemblemutagenesis is described, e.g., in Arkin (1992) Proc. Natl. Acad. Sci.USA 89:7811-7815.

In some aspects, variants are created using exponential ensemblemutagenesis. Exponential ensemble mutagenesis is a process forgenerating combinatorial libraries with a high percentage of unique andfunctional mutants, wherein small groups of residues are randomized inparallel to identify, at each altered position, amino acids which leadto functional proteins. Exponential ensemble mutagenesis is described,e.g., in Delegrave (1993) Biotechnology Res. 11:1548-1552. Random andsite-directed mutagenesis are described, e.g., in Arnold (1993) CurrentOpinion in Biotechnology 4:450-455.

In some aspects, the variants are created using shuffling procedureswherein portions of a plurality of nucleic acids which encode distinctpolypeptides are fused together to create chimeric nucleic acidsequences which encode chimeric polypeptides as described in U.S. Pat.No. 5,965,408, filed Jul. 9, 1996, entitled, “Method of DNA Reassemblyby Interrupting Synthesis” and U.S. Pat. No. 5,939,250, filed May 22,1996, entitled, “Production of Enzymes Having Desired Activities byMutagenesis.

The variants of the polypeptides of the invention may be variants inwhich one or more of the amino acid residues of the polypeptides of thesequences of the invention are substituted with a conserved ornon-conserved amino acid residue (in one aspect a conserved amino acidresidue) and such substituted amino acid residue may or may not be oneencoded by the genetic code.

In one aspect, conservative substitutions are those that substitute agiven amino acid in a polypeptide by another amino acid of likecharacteristics. In one aspect, conservative substitutions of theinvention comprise the following replacements: replacements of analiphatic amino acid such as Alanine, Valine, Leucine and Isoleucinewith another aliphatic amino acid; replacement of a Serine with aThreonine or vice versa; replacement of an acidic residue such asAspartic acid and Glutamic acid with another acidic residue; replacementof a residue bearing an amide group, such as Asparagine and Glutamine,with another residue bearing an amide group; exchange of a basic residuesuch as Lysine and Arginine with another basic residue; and replacementof an aromatic residue such as Phenylalanine, Tyrosine with anotheraromatic residue.

Other variants are those in which one or more of the amino acid residuesof a polypeptide of the invention includes a substituent group. In oneaspect, other variants are those in which the polypeptide is associatedwith another compound, such as a compound to increase the half-life ofthe polypeptide (for example, polyethylene glycol). Additional variantsare those in which additional amino acids are fused to the polypeptide,such as a leader sequence, a secretory sequence, a proprotein sequenceor a sequence which facilitates purification, enrichment, orstabilization of the polypeptide.

In some aspects, the fragments, derivatives and analogs retain the samebiological function or activity as the polypeptides of the invention. Inother aspects, the fragment, derivative, or analog includes aproprotein, such that the fragment, derivative, or analog can beactivated by cleavage of the proprotein portion to produce an activepolypeptide.

Optimizing Codons to Achieve High Levels of Protein Expression in HostCells

The invention provides methods for modifying cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase, enzyme-encodingnucleic acids to modify (e.g., optimize) codon usage. In one aspect, theinvention provides methods for modifying codons in a nucleic acidencoding a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme to increase or decrease its expression in ahost cell. The invention also provides nucleic acids encoding acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme modified to increase its expression in a host cell,cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme so modified, and methods of making the modifiedcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes. The method comprises identifying a “non-preferred”or a “less preferred” codon in cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase, enzyme-encoding nucleic acidand replacing one or more of these non-preferred or less preferredcodons with a “preferred codon” encoding the same amino acid as thereplaced codon and at least one non-preferred or less preferred codon inthe nucleic acid has been replaced by a preferred codon encoding thesame amino acid. A preferred codon is a codon over-represented in codingsequences in genes in the host cell and a non-preferred or lesspreferred codon is a codon under-represented in coding sequences ingenes in the host cell.

Host cells for expressing the nucleic acids, expression cassettes andvectors of the invention include bacteria, yeast, fungi, plant cells,insect cells and mammalian cells (see discussion, above). Thus, theinvention provides methods for optimizing codon usage in all of thesecells, codon-altered nucleic acids and polypeptides made by thecodon-altered nucleic acids. Exemplary host cells include gram negativebacteria, such as Escherichia coli, Pseudomonas fluorescens; grampositive bacteria, such as Streptomyces sp., Lactobacillus gasseri,Lactococcus lactis, Lactococcus cremoris, Bacillus subtilis; Bacilluscereus. Exemplary host cells also include eukaryotic organisms, e.g.,various yeast, such as Saccharomyces sp., including Saccharomycescerevisiae, Schizosaccharomyces pombe, Pichia pastoris, andKluyveromyces lactis, Hansenula polymorpha, Aspergillus niger, andmammalian cells and cell lines and insect cells and cell lines. Thus,the invention also includes nucleic acids and polypeptides optimized forexpression in these organisms and species.

For example, the codons of a nucleic acid encoding a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme isolatedfrom a bacterial cell are modified such that the nucleic acid isoptimally expressed in a bacterial cell different from the bacteria fromwhich the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme was derived, a yeast, a fungi, a plant cell,an insect cell or a mammalian cell. Methods for optimizing codons arewell known in the art, see, e.g., U.S. Pat. No. 5,795,737; Baca (2000)Int. J. Parasitol. 30:113-118; Hale (1998) Protein Expr. Purif.12:185-188; Narum (2001) Infect. Immun. 69:7250-7253. See also Narum(2001) Infect. Immun. 69:7250-7253, describing optimizing codons inmouse systems; Outchkourov (2002) Protein Expr. Purif. 24:18-24,describing optimizing codons in yeast; Feng (2000) Biochemistry39:15399-15409, describing optimizing codons in E. coli; Humphreys(2000) Protein Expr. Purif. 20:252-264, describing optimizing codonusage that affects secretion in E. coli.

Transgenic Non-Human Animals

The invention provides transgenic non-human animals comprising a nucleicacid, a polypeptide (e.g., a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme), an expression cassetteor vector or a transfected or transformed cell of the invention. Theinvention also provides methods of making and using these transgenicnon-human animals.

The transgenic non-human animals can be, e.g., dogs, goats, rabbits,sheep, pigs (including all swine, hogs and related animals), cows, ratsand mice, comprising the nucleic acids of the invention. These animalscan be used, e.g., as in vivo models to study cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activity,or, as models to screen for agents that change the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activity invivo. The coding sequences for the polypeptides to be expressed in thetransgenic non-human animals can be designed to be constitutive, or,under the control of tissue-specific, developmental-specific orinducible transcriptional regulatory factors.

Transgenic non-human animals can, be designed and generated using anymethod known in the art; see, e.g., U.S. Pat. Nos. 6,211,428; 6,187,992;6,156,952; 6,118,044; 6,111,166; 6,107,541; 5,959,171; 5,922,854;5,892,070; 5,880,327; 5,891,698; 5,639,940; 5,573,933; 5,387,742;5,087,571, describing making and using transformed cells and eggs andtransgenic mice, rats, rabbits, sheep, pigs and cows. See also, e.g.,Pollock (1999) J. Immunol. Methods 231:147-157, describing theproduction of recombinant proteins in the milk of transgenic dairyanimals; Baguisi (1999) Nat. Biotechnol. 17:456-461, demonstrating theproduction of transgenic goats. U.S. Pat. No. 6,211,428, describesmaking and using transgenic non-human mammals which express in theirbrains a nucleic acid construct comprising a DNA sequence. U.S. Pat. No.5,387,742, describes injecting cloned recombinant or synthetic DNAsequences into fertilized mouse eggs, implanting the injected eggs inpseudo-pregnant females, and growing to term transgenic mice. U.S. Pat.No. 6,187,992, describes making and using a transgenic mouse.

“Knockout animals” can also be used to practice the methods of theinvention. For example, in one aspect, the transgenic or modifiedanimals of the invention comprise a “knockout animal,” e.g., a “knockoutmouse,” engineered not to express an endogenous gene, which is replacedwith a gene expressing a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention, or, afusion protein comprising a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention.

Transgenic Plants and Seeds

The invention provides transgenic plants and seeds comprising a nucleicacid, a polypeptide (e.g., a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme), an expression cassetteor vector or a transfected or transformed cell of the invention. Theinvention also provides plant products, e.g., oils, seeds, leaves,extracts and the like, comprising a nucleic acid and/or a polypeptide(e.g., a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme) of the invention. The transgenic plant can bedicotyledonous (a dicot) or monocotyledonous (a monocot). The inventionalso provides methods of making and using these transgenic plants andseeds. The transgenic plant or plant cell expressing a polypeptide ofthe present invention may be constructed in accordance with any methodknown in the art. See, for example, U.S. Pat. No. 6,309,872.

Nucleic acids and expression constructs of the invention can beintroduced into a plant cell by any means. For example, nucleic acids orexpression constructs can be introduced into the genome of a desiredplant host, or, the nucleic acids or expression constructs can beepisomes. Introduction into the genome of a desired plant can be suchthat the host's cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme production is regulated by endogenoustranscriptional or translational control elements. The invention alsoprovides “knockout plants” where insertion of gene sequence by, e.g.,homologous recombination, has disrupted the expression of the endogenousgene. Means to generate “knockout” plants are well-known in the art,see, e.g., Strepp (1998) Proc Natl. Acad. Sci. USA 95:4368-4373; Miao(1995) Plant J 7:359-365. See discussion on transgenic plants, below.

The nucleic acids of the invention can be used to confer desired traitson essentially any plant, e.g., on starch-producing plants, such aspotato, tomato, soybean, beets, corn, wheat, rice, barley, and the like.Nucleic acids of the invention can be used to manipulate metabolicpathways of a plant in order to optimize or alter host's expression ofcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme. The can change cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity in a plant.Alternatively, a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme of the invention can be used in production ofa transgenic plant to produce a compound not naturally produced by thatplant. This can lower production costs or create a novel product.

In one aspect, the first step in production of a transgenic plantinvolves making an expression construct for expression in a plant cell.These techniques are well known in the art. They can include selectingand cloning a promoter, a coding sequence for facilitating efficientbinding of ribosomes to mRNA and selecting the appropriate geneterminator sequences. One exemplary constitutive promoter is CaMV35S,from the cauliflower mosaic virus, which generally results in a highdegree of expression in plants. Other promoters are more specific andrespond to cues in the plant's internal or external environment. Anexemplary light-inducible promoter is the promoter from the cab gene,encoding the major chlorophyll a/b binding protein.

In one aspect, the nucleic acid is modified to achieve greaterexpression in a plant cell. For example, a sequence of the invention islikely to have a higher percentage of A-T nucleotide pairs compared tothat seen in a plant, some of which prefer G-C nucleotide pairs.Therefore, A-T nucleotides in the coding sequence can be substitutedwith G-C nucleotides without significantly changing the amino acidsequence to enhance production of the gene product in plant cells.

Selectable marker gene can be added to the gene construct in order toidentify plant cells or tissues that have successfully integrated thetransgene. This may be necessary because achieving incorporation andexpression of genes in plant cells is a rare event, occurring in just afew percent of the targeted tissues or cells. Selectable marker genesencode proteins that provide resistance to agents that are normallytoxic to plants, such as antibiotics or herbicides. Only plant cellsthat have integrated the selectable marker gene will survive when grownon a medium containing the appropriate antibiotic or herbicide. As forother inserted genes, marker genes also require promoter and terminationsequences for proper function.

In one aspect, making transgenic plants or seeds comprises incorporatingsequences of the invention and, optionally, marker genes into a targetexpression construct (e.g., a plasmid), along with positioning of thepromoter and the terminator sequences. This can involve transferring themodified gene into the plant through a suitable method. For example, aconstruct may be introduced directly into the genomic DNA of the plantcell using techniques such as electroporation and microinjection ofplant cell protoplasts, or the constructs can be introduced directly toplant tissue using ballistic methods, such as DNA particle bombardment.For example, see, e.g., Christou (1997) Plant Mol. Biol. 35:197-203;Pawlowski (1996) Mol. Biotechnol. 6:17-30; Klein (1987) Nature327:70-73; Takumi (1997) Genes Genet. Syst. 72:63-69, discussing use ofparticle bombardment to introduce transgenes into wheat; and Adam (1997)supra, for use of particle bombardment to introduce YACs into plantcells. For example, Rinehart (1997) supra, used particle bombardment togenerate transgenic cotton plants. Apparatus for accelerating particlesis described U.S. Pat. No. 5,015,580; and, the commercially availableBioRad (Biolistics) PDS-2000 particle acceleration instrument; see also,John, U.S. Pat. No. 5,608,148; and Ellis, U.S. Pat. No. 5,681,730,describing particle-mediated transformation of gymnosperms.

In one aspect, protoplasts can be immobilized and injected with anucleic acids, e.g., an expression construct. Although plantregeneration from protoplasts is not easy with cereals, plantregeneration is possible in legumes using somatic embryogenesis fromprotoplast derived callus. Organized tissues can be transformed withnaked DNA using gene gun technique, where DNA is coated on tungstenmicroprojectiles, shot 1/100th the size of cells, which carry the DNAdeep into cells and organelles. Transformed tissue is then induced toregenerate, usually by somatic embryogenesis. This technique has beensuccessful in several cereal species including maize and rice.

Nucleic acids, e.g., expression constructs, can also be introduced in toplant cells using recombinant viruses. Plant cells can be transformedusing viral vectors, such as, e.g., tobacco mosaic virus derived vectors(Rouwendal (199.7) Plant Mol. Biol. 33:989-999), see. Porta (1996) “Useof viral replicons for the expression of genes in plants,” Mol.Biotechnol. 5:209-221.

Alternatively, nucleic acids, e.g., an expression construct, can becombined with suitable T-DNA flanking regions and introduced into aconventional Agrobacterium tumefaciens host vector. The virulencefunctions of the Agrobacterium tumefaciens host will direct theinsertion of the construct and adjacent marker into the plant cell DNAwhen the cell is infected by the bacteria. Agrobacteriumtumefaciens-mediated transformation techniques, including disarming anduse of binary vectors, are well described in the scientific literature.See, e.g., Horsch (1984) Science 233:496-498; Fraley (1983) Proc. Natl.Acad. Sci. USA 80:4803 (1983); Gene Transfer to Plants, Potrykus, ed.(Springer-Verlag, Berlin 1995). The DNA in an A. tumefaciens cell iscontained in the bacterial chromosome as well as in another structureknown as a Ti (tumor-inducing) plasmid. The Ti plasmid contains astretch of DNA termed T-DNA (˜20 kb long) that is transferred to theplant cell in the infection process and a series of vir (virulence)genes that direct the infection process. A. tumefaciens can only infecta plant through wounds: when a plant root or stem is wounded it givesoff certain chemical signals, in response to which, the vir genes of A.tumefaciens become activated and direct a series of events necessary forthe transfer of the T-DNA from the Ti plasmid to the plant's,chromosome. The T-DNA then enters the plant cell through the wound. Onespeculation is that the T-DNA waits until the plant DNA is beingreplicated or transcribed, then inserts itself into the exposed plantDNA. In order to use A. tumefaciens as a transgene vector, thetumor-inducing section of T-DNA have to be removed, while retaining theT-DNA border regions and the vir genes. The transgene is then insertedbetween the T-DNA border regions, where it is transferred to the plantcell and becomes integrated into the plant's chromosomes.

The invention provides for the transformation of monocotyledonous plantsusing the nucleic acids of the invention, including important cereals,see Hiei (1997) Plant Mol. Biol. 35:205-218. See also, e.g., Horsch,Science (1984) 233:496; Fraley (1983) Proc. Natl. Acad. Sci USA 80:4803;Thykjaer (1997) supra; Park (1996) Plant Mol. Biol. 32:1135-1148,discussing T-DNA integration into genomic DNA. See also D'Halluin, U.S.Pat. No. 5,712,135, describing a process for the stable integration of aDNA comprising a gene that is functional in a cell of a cereal, or othermonocotyledonous plant.

In one aspect, the third step involves selection and regeneration ofwhole plants capable of transmitting the incorporated target gene to thenext generation. Such regeneration techniques may use manipulation ofcertain phytohormones in a tissue culture growth medium. In one aspect,the method uses a biocide and/or herbicide marker that has beenintroduced together with the desired nucleotide sequences. Plantregeneration from cultured protoplasts is described in Evans et al.,Protoplasts Isolation and Culture, Handbook of Plant Cell Culture, pp.124-176, MacMillilan Publishing Company, New York, 1983; and Binding,Regeneration of Plants, Plant Protoplasts, pp. 21-73, CRC Press, BocaRaton, 1985. Regeneration can also be obtained from plant callus,explants, organs, or parts thereof. Such regeneration techniques aredescribed generally in Klee (1987) Ann. Rev. of Plant Phys. 38:467-486.To obtain whole plants from transgenic tissues such as immature embryos,they can be grown under controlled environmental conditions in a seriesof media containing nutrients and hormones, a process known as tissueculture. Once whole plants are generated and produce seed, evaluation ofthe progeny begins.

In one aspect, after the expression cassette is stably incorporated intransgenic plants, it can be introduced into other plants by sexualcrossing. Any of a number of standard breeding techniques can be used,depending upon the species to be crossed. Since transgenic expression ofthe nucleic acids of the invention leads to phenotypic changes, plantscomprising the recombinant nucleic acids of the invention can besexually crossed with a second plant to obtain a final product. Thus,the seed of the invention can be derived from a cross between twotransgenic plants of the invention, or a cross between a plant of theinvention and another plant. The desired effects (e.g., expression ofthe polypeptides of the invention to produce a plant in which floweringbehavior is altered) can be enhanced when both parental plants expressthe polypeptides (e.g., a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme) of the invention. Thedesired effects can be passed to future plant generations by standardpropagation means.

In one aspect, the nucleic acids and polypeptides of the invention areexpressed in or inserted in any plant or seed. Transgenic plants of theinvention can be dicotyledonous or monocotyledonous. Examples of monocottransgenic plants of the invention are grasses, such as meadow grass(blue grass, Poa), forage grass such as festuca, lolium, temperategrass, such as Agrostis, and cereals, e.g., wheat, oats, rye, barley,rice, sorghum, and maize (corn). Examples of dicot transgenic plants ofthe invention are tobacco, legumes, such as lupins, potato, sugar beet,pea, bean and soybean, and cruciferous plants (family Brassicaceae),such as cauliflower, rape seed, and the closely related model organismArabidopsis thaliana. Thus, the transgenic plants and seeds of theinvention include a broad range of plants, including, but not limitedto, species from the genera Anacardium, Arachis, Asparagus, Atropa,Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea,Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium,Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium,Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana,Olea, Oryza, Panieum, Pannisetum, Persea, Phaseolus, Pistachia, Pisum,Pyrus, Prunus, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum,Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna, and Zea.

In alternative embodiments, the nucleic acids of the invention areexpressed in plants which contain fiber cells, including, e.g., cotton,silk cotton tree (Kapok, Ceiba pentandra), desert willow, creosote bush,winterfat, balsa, ramie, kenaf, hemp, roselle, jute, sisal abaca andflax. In alternative embodiments, the transgenic plants of the inventioncan be members of the genus Gossypium, including members of anyGossypium species, such as G. arboreum; G. herbaceum, G. barbadense, andG. hirsutum.

The invention also provides for transgenic plants to be used forproducing large amounts of the polypeptides (e.g., a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme orantibody) of the invention. For example, see Palmgren (1997) TrendsGenet. 13:348; Chong (1997) Transgenic Res. 6:289-296 (producing humanmilk protein beta-casein in transgenic potato plants using anauxin-inducible, bidirectional mannopine synthase (mas1′,2′) promoterwith Agrobacterium tumefaciens-mediated leaf disc transformationmethods).

Using known procedures, one of skill can screen for plants of theinvention by detecting the increase or decrease of transgene mRNA orprotein in transgenic plants. Means for detecting and quantitation ofmRNAs or proteins are well known in the art.

Polypeptides and Peptides

In one aspect, the invention provides isolated, synthetic or recombinantpolypeptides having a sequence identity (e.g., at least about 50%, 51%,52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%,66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%,80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequenceidentity, or homology) to an exemplary sequence of the invention, e.g.,proteins having the sequence of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6,SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQID NO:18, SEQ ID NO:20, SEQ ID SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28,SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38,SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48,SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58,SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68,SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78,SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88,SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98,SEQ ID NO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ ID NO:106, SEQ IDNO:108, SEQ ID NO:110, SEQ ID NO:112, SEQ ID NO:114, SEQ ID NO:116, SEQID NO:118, SEQ ID NO:120, SEQ ID NO:122, SEQ ID NO:124, SEQ ID NO:126,SEQ ID NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ IDNO:136, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQ ID NO:143, SEQID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ ID NO:154,SEQ ID NO:156, SEQ ID NO:158, SEQ ED NO:160, SEQ ID NO:162, SEQ IDNO:164, SEQ ID NO:166, SEQ ID NO:168, SEQ ID NO:170, SEQ ID NO:172, SEQID NO:174, SEQ ID NO:176, SEQ ID NO:178, SEQ ID NO:180, SEQ ID NO:182,SEQ ID NO:184, SEQ ID NO:186, SEQ ID NO:188, SEQ ID NO:190, SEQ IDNO:192, SEQ ID NO:194, SEQ ID NO:196, SEQ ID NO:198, SEQ ID NO:200, SEQID NO:202, SEQ ID NO:204, SEQ ID NO:206, SEQ ID NO:209, SEQ ID NO:210,SEQ ID NO:212, SEQ ID NO:214, SEQ ID NO:216, SEQ ID NO:218, SEQ IDNO:220, SEQ ID NO:222, SEQ ID NO:224, SEQ ID NO:226, SEQ ID NO:228, SEQID NO:230, SEQ ID NO:232, SEQ ID NO:234, SEQ ID NO:236, SEQ ID NO:238,SEQ ID NO:240, SEQ ID NO:242, SEQ ID NO:244, SEQ ID NO:246, SEQ IDNO:248, SEQ ID NO:250, SEQ ID NO:252, SEQ ID NO:254, SEQ ID NO:256, SEQID NO:258, SEQ ID NO:260, SEQ ID NO:262, SEQ ID NO:264, SEQ ID NO:266,SEQ ID NO:268, SEQ ID NO:270, SEQ ID NO:272, SEQ ID NO:274, SEQ IDNO:276, SEQ ID NO:278, SEQ ID NO:280, SEQ ID NO:282, SEQ ID NO:284, SEQID NO:286, SEQ ID NO:288, SEQ ID NO:290, SEQ ID NO:292, SEQ ID NO:294,SEQ ID NO:296, SEQ ID NO:298, SEQ ID NO:300, SEQ ID NO:302, SEQ IDNO:304, SEQ ID NO:306, SEQ ID NO:308, SEQ ID NO:310, SEQ ID NO:312, SEQID NO:314, SEQ ID NO:316, SEQ ID NO:318, SEQ ID NO:320, SEQ ID NO:322,SEQ ID NO:324, SEQ ID NO:326, SEQ ID NO:328, SEQ ID NO:330, SEQ IDNO:332, SEQ ID NO:334, SEQ ID NO:336, SEQ ID NO:338, SEQ ID NO:340, SEQID NO:342, SEQ ID NO:344, SEQ ID NO:346, SEQ ID NO:348, SEQ ID NO:350,SEQ ID NO:352, SEQ ID NO:354, SEQ ID NO:356, SEQ ID NO:358, SEQ IDNO:360, SEQ ID NO:362, SEQ ID NO:364, SEQ ID NO:366, SEQ ID NO:368, SEQID NO:370, SEQ ID NO:372, SEQ ID NO:374, SEQ ID NO:376, SEQ ID NO:378,SEQ ID NO:380, SEQ ID NO:382, SEQ ID NO:384, SEQ ID NO:386, SEQ IDNO:388, SEQ ID NO:390, SEQ ID NO:392, SEQ ID NO:394, SEQ ID NO:396, SEQID NO:398, SEQ ID NO:400, SEQ ID NO:402, SEQ ID NO:404, SEQ ID NO:406,SEQ ID NO:408, SEQ ID NO:410, SEQ ID NO:412, SEQ ID NO:414, SEQ IDNO:416, SEQ ID NO:418, SEQ ID NO:420, SEQ ID NO:422, SEQ ID NO:424, SEQID NO:426, SEQ ID NO:428, SEQ ID NO:430, SEQ ID NO:432, SEQ ID NO:434,SEQ ID NO:436, SEQ ID NO:438, SEQ ID NO:440, SEQ ID NO:442, SEQ IDNO:444, SEQ ID NO:446, SEQ ID NO:448, SEQ ID NO:450, SEQ ID NO:452, SEQID NO:454, SEQ ID NO:456, SEQ ID NO:458, SEQ ID NO:460, SEQ ID NO:462,SEQ ID NO:464, SEQ ID NO:466, SEQ ID NO:468, SEQ ID NO:470, SEQ IDNO:472, SEQ ID NO:474, SEQ ID NO:476, SEQ ID NO:478, SEQ ID NO:480, SEQID NO:482, SEQ ID NO:484, SEQ ID NO:486, SEQ ID NO:488, SEQ ID NO:490,SEQ ID NO:492, SEQ ID NO:494, SEQ ID NO:496, SEQ ID NO:498, SEQ IDNO:500, SEQ ID NO:502, SEQ ID NO:504, SEQ ID NO:506, SEQ ID NO:508, SEQID NO:510, SEQ ID NO:512, SEQ ID NO:514, SEQ ID NO:516, SEQ ID NO:518,SEQ ID NO:520, SEQ ID NO:522 and/or SEQ ID NO:524 see also Tables 1, 2,and 3, Examples 1 and 4, below, and Sequence Listing)). The percentsequence identity can be over the full length of the polypeptide, or,the identity can be over a region of at least about 15, 20, 25, 30, 35,40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400,450, 500, 550, 600, 650, 700 or more residues.

Polypeptides of the invention can also be shorter than the full lengthof exemplary polypeptides. In alternative aspects, the inventionprovides polypeptides (peptides, fragments) ranging in size betweenabout 5 and the full length of a polypeptide, e.g., an enzyme, such as acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme; exemplary sizes being of about 5, 10, 15, 20, 25,30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 125, 150, 175,200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, or more residues,e.g., contiguous residues of an exemplary cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention. Peptides of the invention (e.g., a subsequence of anexemplary polypeptide of the invention) can be useful as, e.g., labelingprobes, antigens (immunogens), toleragens, motifs, cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activesites (e.g., “catalytic domains”), signal sequences and/or preprodomains.

In alternative aspects, polypeptides having cellulolytic activity, e.g.,cellulases activity, such as endoglucanase, cellobiohydrolase and/orβ-glucosidase (beta-glucosidase) activity, are members of a genus ofpolypeptides sharing specific structural elements, e.g., amino acidresidues, that correlate with cellulolytic activity such as cellulaseactivity, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase activity. These shared structural elements can be used forthe routine generation of cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase variants. These sharedstructural elements of cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes of the invention can beused as guidance for the routine generation of cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes variantswithin the scope of the genus of polypeptides of the invention.

As used herein, the terms “cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase” encompass, but is not limitedto, any polypeptide or enzymes capable of catalyzing the complete orpartial breakdown and/or hydrolysis of cellulose (e.g., exemplarypolypeptides of the invention, see also Tables 1, 2, and 3, Examples 1to 7, below), or any modification or hydrolysis of a cellulose, ahemicellulose or a lignocellulotic material, e.g., a biomass materialcomprising cellulose, hemicellulose and lignin.

The following chart summarizes exemplary enzymatic activities ofexemplary polypeptides of the invention, for example, as indicated bythis chart, in alternative aspects these exemplary polypeptides have,but are not limited to, the following activities:

-   -   the polypeptide having the sequence of SEQ ID NO:2 (encoded by,        e.g., SEQ ID NO:1), has cellobiohydrolase activity.    -   the polypeptide having the sequence of SEQ ID NO:102 (encoded        by, e.g., SEQ ID NO:101), has esterase, or more particularly,        Glycoside hydrolase activity.    -   the polypeptide having the sequence of SEQ ID NO:104 (encoded        by, e.g., SEQ ID NO:103), has glycosyl hydrolase activity.    -   the polypeptide having the sequence of SEQ ID NO:106 (encoded        by, e.g., SEQ ID NO:105), has glycoside hydrolase activity.    -   the polypeptide having the sequence of SEQ ID NO:108 (encoded        by, e.g., SEQ ID NO:107), has endoglucanase activity.    -   the polypeptide having the sequence of SEQ ID NO:110 (encoded        by, e.g., SEQ ID NO:109), has endo-1;4-beta-glucanase activity    -   the polypeptide having the sequence of SEQ ID NO:12 (encoded by,        e.g., SEQ ID NO:11), has cellobiohydrolase II activity.    -   the polypeptide having the sequence of SEQ ID NO:112 (encoded        by, e.g., SEQ ID NO:111), has endo-1;4-beta-glucanase activity.    -   the polypeptide having the sequence of SEQ ID NO:114 (encoded        by, e.g., SEQ ID NO:113), has endo-1;4-beta-glucanase activity.    -   the polypeptide having the sequence of SEQ ID NO:116 (encoded        by, e.g., SEQ ID NO:115), has endoglucanase activity.    -   the polypeptide having the sequence of SEQ ID NO:118 (encoded        by, e.g., SEQ ID NO:117), has dockerin type I, glycoside        hydrolase activity.    -   the polypeptide having the sequence of SEQ ID NO:120 (encoded        by, e.g., SEQ ID NO:119), has 4-beta-cellobiosidase activity.    -   etc., see below:

SEQ ID Activity NO: Assignment Exemplary function based on sequenceidentity (homology) (using BLAST) 1, 2 Cellobiohydrolasecellobiohydrolase 101, 102 Esterase: Glycoside hydrolase, family 10:Clostridium cellulosome enzyme, dockerin type I: Carbohydrate bindingmodule, family 6 [Clostridium thermocellum ATCC 27405]gi|67849757|gb|EAM45352.1| esterase: Glycoside hydrolase, family 10:Clostr 103, 104 glycosyl hydrolase, [Aspergillus fumigatus Af293]gi|70988747|ref|XP_749229.1| glycosyl hydrolase [Aspergillus fumigatusAf293] 105, 106 Lipolytic enzyme, G-D-S-L: Glycoside hydrolase, family5: Clostridium cellulosome enzyme, dockerin type I [Clostridiumthermocellum ATCC 27405] gi|67850336|gb|EAM45917.1| Lipolytic enzyme,G-D-S-L: Glycoside hydrolase, family 5: Clostridium cellulosome enzyme107, 108 endoglucanase [Thermotoga maritima MSB8] 109, 110endo-1;4-beta-glucanase [Cellvibrio japonicus] 11, 12 CellobiohydrolaseCellobiohydrolase II [Talaromyces emersonii]. 111, 112endo-1;4-beta-glucanase [Cellvibrio japonicus] 113, 114endo-1;4-beta-glucanase [Cellvibrio japonicus] 115, 116 endoglucanase[Anaerocellum thermophilum]. 117, 118 Glycoside hydrolase, family 8:Clostridium cellulosome enzyme, dockerin type I [Clostridiumthermocellum ATCC 27405] gi|67851770|gb|EAM47333.1| Glycoside hydrolase,family 8: Clostridium cellulosome enzyme, dockerin type I [Clostridiumthermocellum ATCC 274 119, 120 cellulose 1;4-beta-cellobiosidase[Streptomyces avermitilis MA-4680] 121, 122 secreted hydrolase[Streptomyces coelicolor A3(2)] 123, 124 ENDOGLUCANASE A PRECURSOR(ENDO-1,4-BETA-GLUCANASE) (CELLULASE). 125, 126 E Ibeta-1;4-endoglucanase precursor [Acidothermus cellulolyticus] 127, 128endo-beta-1,4-glucanase; McenA [Micromonospora cellulolyticum]. 129, 130endoglucanase [Thermotoga maritima MSB8] 13, 14 β-glucosidaseBeta-glucosidase/6-phospho-beta-glucosidase/beta-galactosidase[Thermoanaerobacter tengcongensis]. 131, 132 probable endoglucanaseprecursor [Salmonella enterica subsp. enterica serovar Paratyphi A str.ATCC 9150] 133, 134 Cellulase [Ralstonia eutropha JMP134] 135, 136endo-1;4-beta-glucanase [Cellvibrio japonicus] 137, 138 cellulase A[Cellvibrio mixtus]. 139, 140 endo-1;4-beta-D-glucanase [unculturedbacterium] 141, 142 cellulase B [Cellvibrio mixtus]. 143, 144ENDOGLUCANASE X (EGX) (ENDO-1,4-BETA-GLUCANASE) (CELLULASE). 145, 146Glycoside hydrolase, family 9: Clostridium cellulosome enzyme, dockerintype I: Glycoside hydrolase, family 9, N-terminal, Ig-like [Clostridiumthermocellum ATCC 27405] gi|67850314|gb|EAM45895.1| Glycoside hydrolase,family 9: Clostridium cellulosome enzyme, 147, 148 β-glucosidaseXYLOSIDASE/ARABINOSIDASE [INCLUDES: BETA-XYLOSIDASE (1,4-BETA-D-XYLANXYLOHYDROLASE) (XYLAN 1,4-BETA-XYLOSIDASE); ALPHA-L-ARABINOFURANOSIDASE(ARABINOSIDASE)]. 149, 150 β-glucosidase Beta-glycosidase (lacS)[Sulfolobus solfataricus P2] 15, 16 B-glucosidase COG1472:Beta-glucosidase-related glycosidases [Cytophaga hutchinsonii] 151, 152endoglucanase [Thermotoga maritima MSB8] 153, 154 endoglucanase[Thermotoga maritima MSB8] 155, 156 chitinase; containing dual catalyticdomains [Thermococcus kodakarensis KOD1] 157, 158 COG3405: EndoglucanaseY [Escherichia coli E22] gi|75237469|ref|ZP_00721502.1| COG3405:Endoglucanase Y [Escherichia coli E110019]gi|75231623|ref|ZP_00717981.1| COG3405: Endoglucanase Y [Escherichiacoli B7A] gi|75210076|ref|ZP_00710255.1| COG3405: Endog 159, 160endo-1;4-beta-glucanase [Bacillus subtilis subsp. subtilis str. 168]161, 162 beta-1,4-glucanase [Clostridium cellulolyticum]. 163, 164β-glucosidase 1;4-beta-D-glucan glucohydrolase [Microbulbiferhydrolyticus] 165, 166 endoglucanase [Thermotoga maritima MSB8] 167, 168COG2730: Endoglucanase [Microbulbifer degradans 2-40] 169, 170 COG2730:Endoglucanase [Microbulbifer degradans 2-40] 17, 18 B-glucosidasebeta-glucosidase [Streptomyces coelicolor A3(2)] 171, 172 β-glucosidasebeta-glucosidase [Streptomyces coelicolor A3(2)] 173, 174 cellulase (EC3.2.1.4), alkaline - Bacillus sp. (strain KSM-S237). 175, 176 COG2730:Endoglucanase [Microbulbifer degradans 2-40] 177, 178 COG2730:Endoglucanase [Microbulbifer degradans 2-40] 179, 180 β-glucosidase BglY[Paenibacillus sp. C7] 181, 182 endoglucanase [Anaerocellumthermophilum]. 183, 184 endoglucanase [Anaerocellum thermophilum]. 185,186 endoglucanase [Anaerocellum thermophilum]. 187, 188 β-glucosidasebeta-glycosidase [Thermosphaera aggregans] 189, 190 probableendoglucanase precursor [Salmonella enterica subsp. enterica serovarParatyphi A str. ATCC 9150] 19, 20 Cellobiohydrolase CellobiohydrolaseII [Acremonium cellulolyticus Y-94]. 191, 192 cellulose[Spirotrichonympha leidyi] 193, 194 ENDOGLUCANASE PRECURSOR(ENDO-1,4-BETA-GLUCANASE) (CELLULASE). 195, 196 endo-1;4-beta-glucanase;Glycoside hydrolase Family 5 [Bacillus licheniformis ATCC 14580] 197,198 endoglucanase [Thermotoga maritima MSB8] 199, 200 endoglucanasefragment [Aquifex aeolicus VF5] 201, 202 COG3405: Endoglucanase Y[Cytophaga hutchinsonii] 203, 204 endo-1,4-beta-D-glucanase precursor[Pectobacterium chrysanthemi]. 205, 206 COG5297: Cellobiohydrolase A(1;4-beta-cellobiosidase A) [Microbulbifer degradans 2-40] 207, 208probable endoglucanase precursor [Salmonella enterica subsp. entericaserovar Paratyphi A str. ATCC 9150] 209, 210 ENDOGLUCANASE PRECURSOR(ENDO-1,4-BETA-GLUCANASE) (CELLULASE). 21, 22 Cellobiohydrolasehypothetical protein MG07809.4 [Magnaporthe grisea 70-15]ref|XP_367905.1| hypothetical protein MG07809.4 [Magnaporthe grisea70-15] 211, 212 β-glucosidase Glycoside hydrolase, family 1 [Solibacterusitatus Ellin6076] gi|67858748|gb|EAM53848.1| Glycoside hydrolase,family 1 [Solibacter usitatus Ellin6076] 213, 214endo-1;4-beta-glucanase b [Pyrococcus furiosus DSM 3638] 215, 216endo-1;4-beta-glucanase b [Pyrococcus furiosus DSM 3638] 217, 218endoglucanase [Thermotoga maritima MSB8] 219, 220 cellulase A[Cellvibrio mixtus]. 221, 222 ENDOGLUCANASE A PRECURSOR(ENDO-1,4-BETA-GLUCANASE) (CELLULASE A) (EG-A). 223, 224 cellulase[Streptomyces halstedii] 225, 226 β-glucosidase Beta-glucosidase[Burkholderia sp. 383] 227, 228 β-glucosidase Glycoside hydrolase,family 1 [Solibacter usitatus Ellin6076] gi|67858748|gb|EAM53848.1|Glycoside hydrolase, family 1 [Solibacter usitatus Ellin6076] 229, 230β-glucosidase beta-glucosidase [Streptomyces avermitilis MA-4680] 23, 24B-glucosidase COG 1472: Beta-glucosidase-related glycosidases[Clostridium thermocellum ATCC 27405] 231, 232 xylanase [Bifidobacteriumadolescentis] 233, 234 B-1,4-endoglucanase [Prevotella bryantii]. 235,236 cellulase A [Cellvibrio mixtus]. 237, 238 B-1,4-endoglucanase[Prevotella bryantii]. 239, 240 endoglucanase A [Ruminococcus albus]241, 242 COG3405: Endoglucanase Y [Escherichia coli F11] 243, 244 yvfO[Bacillus subtilis subsp. subtilis str. 168] 245, 246 BH2023 [Bacillushalodurans C-125] 247, 248 BH2023 [Bacillus halodurans C-125] 249, 250β-glucosidase beta-glucosidase [Pyrococcus furiosus] 25, 26Cellobiohydrolase Cellobiohydrolase [Irpex lacteus] 251, 252β-glucosidase beta-glucosidase [Pyrococcus furiosus] 253, 254β-glucosidase beta-glucosidase [Thermotoga maritima MSB8] 255, 256endoglucanase-N257 [Bacillus circulans]. 257, 258 cellulase;endo-beta-1,4-glucanase [Bacillus subtilis]. 259, 260 cellulase[Spirotrichonympha leidyi] 261, 262 endo-1;4-beta-glucanase b[Pyrococcus furiosus DSM 3638] 263, 264 β-glucosidase beta-glucosidase[Pyrococcus furiosus] 265, 266 β-glucosidase beta-glucosidase[Streptomyces coelicolor A3(2)] 267, 268 beta-1,4-endoglucanase[Cellulomonas pachnodae]. 269, 270 cellulase [Streptomyces halstedii]27, 28 Cellobiohydrolase hypothetical protein MG04499.4 [Magnaporthegrisea 70-15] ref|XP_362054.1| hypothetical protein MG04499.4[Magnaporthe grisea 70-15] 271, 272 β-glucosidase beta-glucosidase[Streptomyces coelicolor A3(2)] 273, 274 cellulase [Pseudomonas sp.ND137]. 275, 276 beta(1;4)-glucan glucanohydrolase precursor[Pectobacterium carotovorum subsp. carotovorum] 277, 278 COG2730:Endoglucanase [Cytophaga hutchinsonii] 279, 280 cellulase [unculturedbacterium] 281, 282 secreted endoglucanase. [Streptomyces coelicolorA3(2)] 283, 284 E I beta-1;4-endoglucanase precursor [Acidothermuscellulolyticus] 285, 286 ENDOGLUCANASE PRECURSOR(ENDO-1,4-BETA-GLUCANASE) (ALKALINE CELLULASE). 287, 288 COG2730:Endoglucanase [Cytophaga hutchinsonii] 289, 290 ENDOGLUCANASE PRECURSOR(ENDO-1;4-BETA-GLUCANASE) PROTEIN [Ralstonia solanacearum] 29, 30B-glucosidase COG2723:Beta-glucosidase/6-phospho-beta-glucosidase/beta-galactosidase[Novosphingobium aromaticivorans DSM 12444] 291, 292 PKD: Glycosidehydrolase, family 9: Clostridium cellulosome enzyme, dockerin type I:Glycoside hydrolase, family 9, N-terminal, Ig-like [Clostridiumthermocellum ATCC 27405] gi|67850728|gb|EAM46301.1| PKD: Glycosidehydrolase, family 9: Clostridium cellulosome 293, 294 endoglucanase[Anaerocellum thermophilum]. 295, 296 Cellulase [Frankia sp. EAN1pec]gi|68196961|gb|EAN11335.1| Cellulase [Frankia sp. EAN1pec] 297, 298β-glucosidase beta-glucosidase (gentiobiase) [Bacteroidesthetaiotaomicron VPI-5482] 299, 300 COG2730: Endoglucanase [Cytophagahutchinsonii] 3, 4 B-glucosidase glucan 1,4-beta-glucosidase[Xanthomonas axonopodis pv. citri str. 306]. 301, 302 cellulase A[Cellvibrio mixtus]. 303, 304 β-glucosidase beta-glucosidase [unculturedbacterium] 305, 306 Glycoside hydrolase, family 5 [Clostridiumthermocellum ATCC 27405] gi|67850654|gb|EAM46228.1| Glycoside hydrolase,family 5 [Clostridium thermocellum ATCC 27405]gi|121821|sp|P23340|GUNC_CLOSF Endoglucanase C307 precursor(Endo-1,4-beta-glucanase) (Cellu 307, 308 beta-1;4-endoglucanaseprecursor [Thermobifida fusca] 309, 310 β-glucosidaseN-acetyl-beta-glucosaminidase [Cellulomorias fimi]. 31, 32 B-glucosidasebeta-glucosidase [Thermotoga maritima] 311, 312 β-glucosidase Glycosidehydrolase, family 3, N-terminal: Glycoside hydrolase, family 3,C-terminal [Clostridium thermocellum ATCC 27405]gi|67851719|gb|EAM47282.1| Glycoside hydrolase, family 3, N-terminal:Glycoside hydrolase, family 3, C-terminal [Clostridium thermocel 313,314 458aa long hypothetical endo-1;4-beta-glucanase [Pyrococcushorikoshii OT3] 315, 316 Glycoside hydrolase, family 5 [Clostridiumthermocellum ATCC 27405] gi|67850654|gb|EAM46228.1| Glycoside hydrolase,family 5 [Clostridium thermocellum ATCC 27405]gi|121821|sp|P23340|GUNC_CLOSF Endoglucanase C307 precursor(Endo-1,4-beta-glucanase) (Cellu 317, 318 458aa long hypotheticalendo-1;4-beta-glucanase [Pyrococcus horikoshii OT3] 319, 320β-glucosidase Beta-glucosidase [Clostridium thermocellum ATCC 27405]gi|67851799|gb|EAM47362.1| Beta-glucosidase [Clostridium thermocellumATCC 27405] 321, 322 β-glucosidase |1W3J|B Chain B; Family 1B-Glucosidase From Thermotoga Maritima In Complex With Tetrahydrooxazine323, 324 β-glucosidase Glycoside hydrolase, family 1 [Rhodoferaxferrireducens DSM 15236] gi|72603011|gb|EAO39027.1| Glycoside hydrolase,family 1 [Rhodoferax ferrireducens DSM 15236] 325, 326 β-glucosidasebeta-glucosidase [Thermotoga maritima] 327, 328 β-glucosidasebeta-glucosidase [Thermotoga maritima] 329, 330 β-glucosidaseBeta-glucosidase [Rhodobacter sphaeroides ATCC 17029]gi|83363682|gb|EAP67177.1| Beta-glucosidase [Rhodobacter sphaeroidesATCC 17029] 33, 34 Cellobiohydrolase GUXC_FUSOX exoglucanase type Cprecursor (Exocellobiohydrolase 1) (1,4-beta-cellobiohydrolase) (Beta-glucancellobiohydrolase) [Gibberella zeae PH-1] gb|AA042612.2|exoglucanase type C precursor [Gibberella zeae] ref|XP_380747.1|GUXC_FUSOX Putati 331, 332 β-glucosidase cellulase [Prevotellaruminicola]. 333, 334 β-glucosidase beta-glucosidase [Thermotogamaritima] 335, 336 β-glucosidase Beta-glucosidase [Clostridiumthermocellum ATCC 27405] gi|67851799|gb|EAM47362.1| Beta-glucosidase[Clostridium thermocellum ATCC 27405] 337, 338 endoglucanase [Thermotogamaritima MSB8] 339, 340 β-glucosidase beta-glucosidase [Bradyrhizobiumjaponicum USDA 110] 341, 342 β-glucosidase Beta-glucosidase[Thermoanaerobacter ethanolicus ATCC 33223] gi|76589196|gb|EAO65595.1|Beta-glucosidase [Thermoanaerobacter ethanolicus ATCC 33223] 343, 344β-glucosidase endoglucanase [Thermotoga maritima MSB8] 345, 346β-glucosidase beta-glucosidase [Bradyrhizobium japonicum USDA 110] 347,348 β-glucosidase Glycoside hydrolase, family 1 [Chloroflexusaurantiacus J-10-fl] gi|76164330|gb|EAO58481.1| Glycoside hydrolase,family 1 [Chloroflexus aurantiacus J-10-fl] 349, 350 β-glucosidaseBeta-glucosidase [Deinococcus geothermalis DSM 11300]gi|66780499|gb|EAL81486.1| Beta-glucosidase [Deinococcus geothermalisDSM 11300] 35, 36 Cellobiohydrolase Cellobiohydrolase [Trichodermaharzianum] 351, 352 β-glucosidase beta-glucosidase [Bradyrhizobiumjaponicum USDA 110] 353, 354 NanG8 [Streptomyces nanchangensis]. 355,356 β-glucosidase Beta-glucosidase [Deinococcus geothermalis DSM 11300]gi|66780499|gb|EAL81486.1| Beta-glucosidase [Deinococcus geothermalisDSM 11300] 357, 358 β-glucosidase beta-glucosidase [Streptomycescoelicolor A3(2)] 359, 360 endoglucanase [Thermotoga maritima MSB8] 361,362 β-glucosidase 423aa long hypothetical beta-glucosidase [Pyrococcushorikoshii OT3] 363, 364 β-glucosidase Beta-glucosidase [Rubrobacterxylanophilus DSM 9941] gi|68512480|gb|EAN36288.1| Beta-glucosidase[Rubrobacter xylanophilus DSM 9941] 365, 366 β-glucosidasebeta-glucosidase [Thermus thermophilus HB8] 367, 368 β-glucosidaseBeta-glucosidase/6-phospho-beta-glucosidase/beta-galactosidase [Hahellachejuensis KCTC 2396] 369, 370 β-glucosidase COG1293: PredictedRNA-binding protein homologous to eukaryotic snRNP [Cytophagahutchinsonii] 37, 38 Endoglucanase probable cellulose [Bradyrhizobiumjaponicum USDA 110] 371, 372 458aa long hypotheticalendo-1;4-beta-glucanase [Pyrococcus horikoshii OT3] 373, 374β-glucosidase Beta-glucosidase [Burkholderia sp. 383] 375, 376β-glucosidase beta-glucosidase [uncultured murine large bowel bacteriumBAC 31B] 377, 378 β-glucosidase beta-glucosidase [Thermotoga maritimaMSB8] 379, 380 β-glucosidase exo-1,4-beta-glucosidase [Prevotellaalbensis]. 381, 382 β-glucosidase Glycoside hydrolase, family 3,N-terminal: Glycoside hydrolase, family 3, C-terminal [Sphingopyxisalaskensis RB2256] gi|68524235|gb|EAN47359.1| Glycoside hydrolase,family 3, N-terminal: Glycoside hydrolase, family 3, C-terminal[Sphingopyxis alaskensis RB 383, 384 β-glucosidase beta-glucosidase[Thermotoga maritima] 385, 386 β-glucosidase beta-glucosidase[Bacteroides thetaiotaomicron VPI-5482] 387, 388 β-glucosidaseBeta-glucosidase [Thermoanaerobacter ethanolicus ATCC 33223]gi|76589196|gb|EAO65595.1| Beta-glucosidase [Thermoanaerobacterethanolicus ATCC 33223] 389, 390 β-glucosidase Glycoside hydrolase,family 3, N-terminal: Glycoside hydrolase, family 3, C-terminal[Chlorobium phaeobacteroides BS1] gi|67913451|gb|EAM62862.1| Glycosidehydrolase, family 3, N-terminal: Glycoside hydrolase, family 3,C-terminal [Chlorobium phaeobacteroid 39, 40 Endoglucanase |LIC20191|hypothetical protein LIC20191 [Leptospira interrogans serovarCopenhageni str. Fiocruz L1-130] 391, 392 β-glucosidase beta-glucosidase[Thermotoga maritima] 393, 394 β-glucosidase xylosidase/arabinosidase[Colwellia psychrerythraea 34H] 395, 396 β-glucosidasebeta-D-glucosidase [Novosphingobium aromaticivorans DSM 12444]gi|78774097|gb|EAP37753.1| beta-D-glucosidase [Novosphingobiumaromaticivorans DSM 12444] 397, 398 β-glucosidase |1GON|B Chain B;B-Glucosidase From Streptomyces Sp 399, 400 β-glucosidaseBeta-glucosidase [Rubrobacter xylanophilus DSM 9941]gi|68512480|gb|EAN36288.1| Beta-glucosidase [Rubrobacter xylanophilusDSM 9941] 401, 402 β-glucosidase beta-glucosidase [uncultured murinelarge bowel bacterium BAC 31B] 403, 404 β-glucosidase glycosylhydrolase; family 3 [Enterococcus faecalis V583] 405, 406 β-glucosidasethermostable beta-glucosidase B [Clostridium beijerincki NCIMB 8052]gi|82726826|gb|EAP61562.1| thermostable beta- glucosidase B [Clostridiumbeijerincki NCIMB 8052] 407, 408 β-glucosidase Beta-glucosidase[Thermoanaerobacter ethanolicus ATCC 33223] gi|76589196|gb|EAO65595.1|Beta-glucosidase [Thermoanaerobacter ethanolicus ATCC 33223] 409, 410β-glucosidase beta-glucosidase [Thermotoga maritima] 41, 42B-glucosidase beta-glucosidase [Agrobacterium tumefaciens str. C58 (U.Washington)]. 411, 412 β-glucosidase 1;4-beta-D-glucan glucohydrolase[Microbulbifer hydrolyticus] 413, 414 β-glucosidase Glycoside hydrolase,family 3, N-terminal: Glycoside hydrolase, family 3, C-terminal[Sphingopyxis alaskensis RB2256] gi|68524235|gb|EAN47359.1| Glycosidehydrolase, family 3, N-terminal: Glycoside hydrolase, family 3,C-terminal [Sphingopyxis alaskensis RB 415, 416 cellulase [Prevotellaruminicola]. 417, 418 β-glucosidase Glycoside hydrolase, family 3,N-terminal: Glycoside hydrolase, family 3, C-terminal [Sphingopyxisalaskensis RB2256] gi|68524235|gb|EAN47359.1| Glycoside hydrolase,family 3, N-terminal: Glycoside hydrolase, family 3, C-terminal[Sphingopyxis alaskensis RB 419, 420 β-glucosidase glucocerebrosidase[Paenibacillus sp. TS12]. 421, 422 β-glucosidase 1,4-B-D-glucanglucohydrolase [Pseudomonas fluorescens]. 423, 424 β-glucosidaseGlycoside hydrolase, family 1 [Solibacter usitatus Ellin6076]gi|67858748|gb|EAM53848.1| Glycoside hydrolase, family 1 [Solibacterusitatus Ellin6076] 425, 426 β-glucosidase N-acetyl-beta-glucosaminidase[Cellulomonas fimi]. 427, 428 endoglucanase [Thermotoga maritima MSB8]429, 430 endoglucanase [Thermotoga maritima MSB8] 43, 44Cellobiohydrolase cellobiohydrolase II [Acremonium cellulolyticus Y-94].431, 432 β-glucosidase Conserved hypothetical protein [Novosphingobiumaromaticivorans DSM 12444] gi|78775241|gb|EAP38896.1| Conservedhypothetical protein [Novosphingobium aromaticivorans DSM 12444] 433,434 PFAM: nucleotidyl-sugar pyranose mutase [Campylobacter jejuni]galactopyranose mutase 435, 436 β-glucosidase Beta-glucosidase[Rhodobacter sphaeroides ATCC 17029] gi|83363682|gb|EAP67177.1|Beta-glucosidase [Rhodobacter sphaeroides ATCC 17029] 437, 438β-glucosidase Beta-glucosidase [Deinococcus geothermalis DSM 11300]gi|66780499|gb|EAL81486.1| Beta-glucosidase [Deinococcus geothermalisDSM 11300] 439, 440 IMP dehydrogenase/GMP reductase: Beta-lactamase[Rhodopseudomonas palustris BisA53] gi|77696568|gb|EAO87746.1| IMPdehydrogenase/GMP reductase: Beta-lactamase [Rhodopseudomonas palustrisBisA53] 441, 442 alpha-glucuronidase [Thermotoga maritima] 443, 444xylanase [Microbulbifer hydrolyticus] 445, 446 exo-cellobiohydrolase[Penicillium chrysogenum] 447, 448 extra-cellular xylanase [Geobacillusstearothermophilus] 449, 450 hypothetical protein MG07908.4 [Magnaporthegrisea 70-15] 45, 46 Cellobiohydrolase C-family cellulose homologue 2451, 452 EXOGLUCANASE TYPE C PRECURSOR (EXOCELLOBIOHYDROLASE I)(1,4-BETA-CELLOBIOHYDROLASE) (BETA-GLUCANCELLOBIOHYDROLASE). 453, 454glycosyl hydrolase [Clostridium beijerincki NCIMB 8052] 455, 456alpha-L-arabinofuranosidase [Clostridium stercorarium]. 457, 458 pectatelyase [uncultured bacterium] 459, 460 Glycoside Hydrolase Family 51[Bacillus licheniformis ATCC 14580] 461, 462 alpha-L-arabinofuranosidase[Bacillus subtilis]. 463, 464 alpha-L-arabinosidase; beta-xylosidase[Bacillus subtilis subsp. subtilis str. 168] 465, 466alpha-L-arabinofuranosidase [Bacillus halodurans C-125] 467, 468alpha-L-arabinofuranosidase [Thermotoga maritima] 469, 470alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 47, 48Endoglucanase cellulase [Prevotella ruminicola]. 471, 472alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 473, 474alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 475, 476alpha-L-arabinofuranosidase [Streptomyces coelicolor A3(2)] 477, 478alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 479, 480arabinase-TS [Bacillus sp. TS-3] 481, 482 alpha-L-arabinofuranosidase[Geobacillus stearothermophilus] 483, 484 Alpha-L-arabinofuranosidase[Arthrobacter sp. FB24] gi|66963675|ref|ZP_00411246.1|Alpha-L-arabinofuranosidase [Arthrobacter sp. FB24] 485, 486alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 487, 488alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 489, 490alpha-L-arabinofuranosidase [Geobacillus stearothermophilus] 49, 50B-glucosidase beta-glucosidase [Bacillus sp.]. 491, 492 |1GON|B Chain B;B-Glucosidase From Streptomyces Sp 493, 494 xylosidase/arabinosidase[Caulobacter crescentus CB15] 495, 496 XYLOSIDASE/ARABINOSIDASE[INCLUDES: BETA-XYLOSIDASE (1,4-BETA-D-XYLAN XYLOHYDROLASE) (XYLAN1,4-BETA-XYLOSIDASE); ALPHA-L-ARABINOFURANOSIDASE (ARABINOSIDASE)]. 497,498 xylan beta-1;4-xylosidase [Bacillus halodurans C-125] 499, 500Glycoside hydrolase, family 3, N-terminal: Glycoside hydrolase, family3, C-terminal [Solibacter usitatus Ellin6076] gi|67861669|gb|EAM56697.1|Glycoside hydrolase, family 3, N-terminal: Glycoside hydrolase, family3, C-terminal [Solibacter usitatus Ellin60 5, 6 B-glucostdasebeta-xylosidase B [Clostridium stercorarium]. 501, 502XYLOSIDASE/ARABINOSIDASE [INCLUDES: BETA-XYLOSIDASE (1,4-BETA-D-XYLANXYLOHYDROLASE) (XYLAN 1,4-BETA-XYLOSIDASE); ALPHA-L-ARABINOFURANOSIDASE(ARABINOSIDASE)]. 503, 504 XYLOSIDASE/ARABINOSIDASE [INCLUDES:BETA-XYLOSIDASE (1,4-BETA-D-XYLAN XYLOHYDROLASE) (XYLAN1,4-BETA-XYLOSIDASE); ALPHA-L-ARABINOFURANOSIDASE (ARABINOSIDASE)]. 505,506 XYLOSIDASE/ARABINOSIDASE [INCLUDES: BETA-XYLOSIDASE(1,4-BETA-D-XYLAN XYLOHYDROLASE) (XYLAN 1,4-BETA-XYLOSIDASE);ALPHA-L-ARABINOFURANOSIDASE (ARABINOSIDASE)]. 507, 508 pectate lyase[uncultured bacterium] 509, 510 pectate lyase [uncultured bacterium] 51,52 Cellobiohydrolase exo-cellobiohydrolase I [Penicillium janthinellum]511, 512 pectate lyase [uncultured bacterium] 513, 514 pectate lyase[uncultured bacterium] 515, 516 hypothetical protein [Neurospora crassa]517, 518 PehC [Ralstonia solanacearum]. 53, 54 CellobiohydrolaseCellobiohydrolase I [Penicillium occitanis] 519, 520 Oligomerasehypothetical protein SNOG_08993 [Phaeosphaeria nodorum SN15] 521, 522Oligomerase beta-glucosidase [Phaeosphaeria avenaria f. sp. triticae]523, 524 xylanase endo-1;4-beta-xylanase precursor [unculturedbacterium] 55, 56 Cellobiohydrolase exoglucanase [Alternaria alternata].57, 58 B-glucosidase glucan 1,4-beta-glucosidase (EC 3.2.1.74) -Pseudomonas fluorescens subsp. cellulosa. 59, 60 Cellobiohydrolase|A38979| cellulose 1;4-beta-cellobiosidase (EC 3.2.1.91) II - fungus(Trichoderma viride) 61, 62 Endoglucanase cellodextrinase. 63, 64Cellobiohydrolase hypothetical protein MG04499.4 [Magnaporthe grisea70-15] ref|XP_362054.1| hypothetical protein MG04499.4 [Magnaporthegrisea 70-15] 65, 66 Cellobiohydrolase Cellobiohydrolase II [Trichodermaparceramosum] 67, 68 Cellobiohydrolase Cellobiohydrolase II [Talaromycesemersonii]. 69, 70 B-glucosidase beta-glucosidase [Methylococcuscapsulatus str. Bath] gb|AAU92142.1| beta-glucosidase [Methylococcuscapsulatus str. Bath] 7, 8 B-glucosidase beta-glucosidase [Bacteroidesfragilis YCH46] 71, 72 Cellobiohydrolase Cellobiohydrolase II[Talaromyces emersonii]. 73, 74 Cellobiohydrolase cellulase CEL2[Leptosphaeria maculans]. 75, 76 B-glucosidase beta-glucosidase[Bradyrhizobium japonicum USDA 110] 77, 78 CellobiohydrolaseCellobiohydrolase 1 catalytic domain [Talaromyces emersonii] 79, 80B-glucosidase beta-mannosidase [Pyrococcus furiosus DSM 3638]. 81, 82Cellobiohydrolase Cellobiohydrolase II [Talaromyces emersonii]. 83, 84Cellobiohydrolase Cellobiohydrolase D [Aspergillus oryzae]. 85, 86Endoglucanase cellulase [Prevotella ruminicola]. 87, 88Cellobiohydrolase hypothetical protein MG07809.4 [Magnaporthe grisea70-15] ref|XP_367905.1| hypothetical protein MG07809.4 [Magnaporthegrisea 70-15] 89, 90 B-glucosidase beta-D-glucosidase [Caulobactercrescentus].  9, 10 Cellobiohydrolase Cellobiohydrolase I [Penicilliumoccitanis] 91, 92 alpha-L-arabinofuranosidase ArfA [Clostridiumcellulovorans]. 93, 94 B-glucosidase Beta-glucosidase[Thermoanaerobacter ethanolicus ATCC 33223] gi|76589196|gb|EAO65595.1|Beta-glucosidase [Thermoanaerobacter ethanolicus ATCC 33223] 95, 96beta-xylosidase [Geobacillus stearothermophilus]. 97, 98 hypotheticalprotein AN5282.2 [Aspergillus nidulans FGSC A4]  99, 100beta-1,4-xylanase [Pseudomonas sp. ND137].

Oligomerases

The invention also provides polypeptides of the invention havingoligomerase enzymatic activity, e.g., an oligomerase-1 (a β-glucosidase)or an oligomerase-2 (a β-xylosidase), or able to catalyze the hydrolysisof (degrade) soluble cellooligsaccharides and/or arabinoxylan oligomersinto monomers, such as xylose, arabinose and glucose. For example, theexemplary polypeptides of the invention SEQ ID NO:522 (encoded, e.g., bySEQ ID NO:521) and SEQ ID NO:520 (encoded, e.g., by SEQ ID NO:519) haveoligomerase enzymatic activity.

During the enzymatic hydrolysis of hemicellulose and cellulose inbiomass (e.g., alkaline pretreated corn biomass), the insolublepolymeric substrates are first converted into soluble oligomers such asarabinoxylooligosaccharides and cellooligosaccharides and these solubleoligomers can be further degraded to fermentable, monomeric sugars byoligomerase polypeptides of the invention, e.g., SEQ ID NO:522 (encoded,e.g., by SEQ ID NO:521) and SEQ ID NO:520 (encoded, e.g., by SEQ IDNO:519). Thus, the invention also provides methods for “converting”arabinoxylo-oligosaccharides and cellooligosaccharides to monomericsugars (e.g., fermentable, monomeric sugars, such as xylose, arabinoseand glucose. The invention also provides methods for treating biomass,e.g., corn, such as alkaline pretreated corn biomass, to “convert” thebiomass to fermentable, monomeric sugars using one or any combination ofenzymes of the invention, including one or more of the polypeptides ofthe invention having oligomerase enzymatic activity.

In alternative aspect, enzymes of the invention are combined together invarious combinations, or are combined with other oligomerases,cellulases and/or hemicellulases to form enzyme cocktails that enablethe conversion of plant biomass (e.g., corn, grasses) to fermentablemonomeric sugars, such as xylose, arabinose and glucose. Arepresentative enzyme cocktail is listed below:

Usage Enzyme Class (mg/g cellulose) SEQ ID NO: 106 Endoglucanase 1.7(encoded, e.g., by SEQ ID NO: 105) SEQ ID NO: 34 GH 7 cellobiohydrolase1 (encoded, e.g., by SEQ ID NO: 33) SEQ ID NO: 98 GH 6 cellobiohydrolase5 (encoded, e.g., by SEQ ID NO: 97) SEQ ID NO: 94 β-glucosidase 1.3(encoded, e.g., by SEQ ID NO: 93) SEQ ID NO: 100 GH11 endoxylanase 0.6(encoded, e.g., by SEQ ID NO: 99) SEQ ID NO: 102 GH10 endoxylanase 0.2(encoded, e.g., by SEQ ID NO: 101) SEQ ID NO: 96 β-xylosidase 0.5(encoded, e.g., by SEQ ID NO: 95) SEQ ID NO: 92 Arabinofuranosidase 0.3(encoded, e.g., by SEQ ID NO: 91) SEQ ID NO: 520 GH 3 oligomerase 0.5(encoded, e.g., by SEQ ID NO: 519) SEQ ID NO: 522 GH 3 oligomerase 0.5(encoded, e.g., by SEQ ID NO: 521)

FIG. 121 illustrates HPLC traces of two saccharification reactionproducts (at 48 hr of incubation), these data demonstrating the roleoligomerases (and in this study, oligomerases of this invention) play inthe degradation of soluble oligomeric saccharides. Peaks at 13 and 16minutes are soluble oligomeric arabinoxyan and cellooligosaccharides,respectively. In FIG. 121, the top panel illustrates an HPLC trace of asaccharification reaction using a “cocktail” designated “E8” with nooligomerases (the first eight enzymes noted above, or, endoglucanase, GH7 cellobiohydrolase, GH 6 cellobiohydrolase, β-glucosidase and anarabinofuranosidase). Thus, the invention provides compositionscomprising the “cocktail” comprising an endoglucanase, a GH 7cellobiohydrolase, a GH 6 cello-biohydrolase, a β-glucosidase and anarabinofuranosidase, and in one aspect, at least one, several or all areenzymes of this invention, e.g., an exemplary enzymes of this invention,e.g., SEQ ID NO:106 (encoded, e.g., by SEQ ID NO:105), SEQ ID NO:34(encoded, e.g., by SEQ ID NO:33), SEQ ID NO:98 (encoded, e.g., by SEQ IDNO:97), SEQ ID NO:94 (encoded, e.g., by SEQ ID NO:93), SEQ ID NO:100(encoded, e.g., by SEQ ID NO:99), SEQ ID NO:102 (encoded, e.g., by SEQID NO:101), SEQ ID NO:96 (encoded, e.g., by SEQ ID NO:95), and/or SEQ IDNO:92 (encoded, e.g., by SEQ ID NO:91).

In FIG. 121, the bottom panel illustrates an HPLC trace of asaccharification reaction using a “cocktail” comprising this “E8”cocktail plus two oligomerases I and II, as noted above (these exemplaryenzymes of the invention are SEQ ID NO:520 (encoded, e.g., by SEQ IDNO:519) and SEQ ID NO:522 (encoded, e.g., by SEQ ID NO:521)). These dataclearly demonstrate that addition of these exemplary oligomerase enzymesof the invention decrease the levels of oligomeric arabinoxyan andcellooligosaccharides and increase the amount of monomeric (fermentable)sugars). Thus, the invention provides compositions comprising the“cocktail” comprising “cocktails” of cellulose degrading enzymes, suchas the exemplary “E8” mixture noted above, and an oligomerase, and inone aspect, at least one or several oligomerase enzymes of thisinvention, e.g., the exemplary SEQ ID NO:520 (encoded, e.g., by SEQ IDNO:519) and SEQ ID NO:522 (encoded, e.g., by SEQ ID NO:521) enzymes ofthis invention

Oligomerase II (the exemplary SEQ ID NO:520 (encoded, e.g., by SEQ IDNO:519)), when dosed at 1 mg/g of cellulose, increased both xylose (from52% to 66%) and glucose (from 63% to 70%) yields compared to a cocktailwithout the exemplary SEQ ID NO:520 (encoded, e.g., by SEQ ID NO:519).FIG. 122 illustrates time course studies of the reactions with enzymecocktails E8 (as defined above, as “set-1” in FIG. 122A) and E8 plus anoligomerase II, the exemplary SEQ ID NO:520 (encoded, e.g., by SEQ IDNO:519) (the so-called “E9” cocktail mixture, or “set-2” in FIG. 122B).In FIG. 122A, the top panel illustrates time course studies for glucoseyields; and in FIG. 122B, and the bottom panel illustrates time coursestudies for xylose yields.

Additional experiments demonstrated that oligomerase I (the exemplarySEQ ID NO:522 (encoded, e.g., by SEQ ID NO:521)) was capable ofdegrading both cellobiose and other cellooligosaccharides, thusrendering the exemplary β-glucosidase SEQ ID NO:94 (encoded, e.g., bySEQ ID NO:93) unnecessary in the cocktail. Thus, another exemplaryenzyme cocktail of the invention comprises the exemplary SEQ ID NO:106(encoded, e.g.; by SEQ ID NO:105), SEQ ID NO:34 (encoded, e.g., by SEQID NO:33), SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97), SEQ ID NO:94(encoded, e.g., by SEQ ID NO:93), SEQ ID NO:100 (encoded, e.g., by SEQID NO:99), SEQ ID NO:102 (encoded, e.g., by SEQ ID NO:101), SEQ ID NO:96(encoded, e.g., by SEQ ID NO:95), SEQ ID NO:92 (encoded, e.g., by SEQ IDNO:91) and SEQ ID NO:522 (encoded, e.g., by SEQ ID NO:521); or, theexemplary SEQ ID NO:106 (encoded, e.g., by SEQ ID NO:105), SEQ ID NO:34(encoded, e.g., by SEQ ID NO:33), SEQ ID NO:98 (encoded, e.g., by SEQ IDNO:97), SEQ ID NO:100 (encoded, e.g., by SEQ ID NO:99), SEQ ID NO:102(encoded, e.g., by SEQ ID NO:101), SEQ ID NO:96 (encoded, e.g., by SEQID NO:95), SEQ ID NO:92 (encoded, e.g., by SEQ ID NO:91) and SEQ IDNO:520 (encoded, e.g., by SEQ ID NO:519) (or SEQ ID NO:522 (encoded,e.g., by SEQ ID NO:521)).

Similarly, oligomerase II (the exemplary SEQ ID NO:520 (encoded, e.g.,by SEQ ID NO:519)) was also capable of degrading both xylobiose andother soluble arabinoxylan by replacing the exemplary β-xylosidase SEQID NO:96 (encoded, e.g., by SEQ ID NO:95) in the cocktail. Thus, anotherexemplary enzyme cocktail of the invention comprises SEQ ID NO:106(encoded, e.g., by SEQ ID NO:105), SEQ ID NO:34 (encoded, e.g., by SEQID NO:33), SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97), SEQ ID NO:94(encoded, e.g., by SEQ ID NO:93), SEQ ID NO:100 (encoded, e.g., by SEQID NO:99), SEQ ID NO:102 (encoded, e.g., by SEQ ID NO:101), SEQ ID NO:96(encoded, e.g., by SEQ ID NO:95), SEQ ID NO:92 (encoded, e.g., by SEQ IDNO:91) and SEQ ID NO:520 (encoded, e.g., by SEQ ID NO:519); or, SEQ IDNO:106 (encoded, e.g., by SEQ ID NO:105), SEQ ID NO:34 (encoded, e.g.,by SEQ ID NO:33), SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97), SEQ IDNO:94 (encoded, e.g., by SEQ ID NO:93), SEQ ID NO:100 (encoded, e.g., bySEQ ID NO:99), SEQ ID NO:102 (encoded, e.g., by SEQ ID NO:101), SEQ IDNO:92 (encoded, e.g., by SEQ ID NO:91) and SEQ ID NO:520 (encoded, e.g.,by SEQ ID NO:519) (or SEQ ID NO:522 (encoded, e.g., by SEQ ID NO:521)).

In alternative aspects, individual enzymes of the invention, orcombinations (or “cocktails” or mixtures) of enzymes of the invention(which can comprise one or several non-invention enzymes) can be used toprocess (degrade) commercial cellulase preparations, e.g., those derivedfrom crude fungal culture broths, such as Trichoderma reesei. Theenzymes of the invention are added because the commercial preparationsalone are deficient in many enzyme activities, e.g., hemicellulaseactivity, which are required to digest alkaline pretreated biomass. Moreimportantly, enzymes of the invention are added because the majority ofsolubilized xylan in commercial cellulase preparations exist inoligomeric forms which, without addition of enzymes of the invention,would be resistant to further degradation to monomer sugars. Thus, theinvention provides enzyme solutions to recalcitrant solublexylooligomers, e.g., those found in commercial cellulase preparationssuch as those derived from crude fungal culture broths. The inventionalso provides enzyme solutions to degrading solublecellooligosaccharides, although their ratio to glucose monomer isgenerally smaller. In one aspect, the oligomerases of this inventionallow the breakdown of recalcitrant cellooligosaccharides andarabinoxylooligomers into fermentable, monomeric sugars such as glucose,xylose and arabinose.

In one aspect, enzymes of the invention, including the “cocktail” enzymemixtures of this invention, increase the overall conversion ofhemicellulose and cellulose to monomeric sugar from a biomass, e.g., acorn- or grass-comprising biomass. Without the addition of theoligomerases of this invention a large amount of xylose remains tied upin non-fermentable oligosaccharides. Furthermore, these oligomeraseenzymes of this invention can be used to replace and/or supplement otherenzymes in a cocktail, for example, beta-glucosidase and/orbeta-xylosidase, thereby not increasing the overall protein loading.These two exemplary oligomerases of this invention enzymes have beendemonstrated to be multi-functional in that they have relaxed substratespecificities, as discussed above (and see also FIGS. 121 and 122).

Assays for Determining or Characterizing the Activity of an Enzyme

Assays for determining or characterizing the activity of an enzyme, suchas determining oligomerase, cellulase, xylanase, cellobiohydrolase,β-glucosidase, β-xylosidase, arabinofuranosidase or related activity,e.g., to determine if a polypeptide is within the scope of theinvention, are well known in the art, for example, see Thomas M. Wood,K. Mahalingeshwara Bhat, “Methods for Measuring Cellulase Activities”,Methods in Enzymology, 160, 87-111 (1988); U.S. Pat. Nos. 5,747,320;5,795,766; 5,973,228; 6,022,725; 6,087,131; 6,127,160; 6,184,018;6,423,524; 6,566,113; 6,921,655.

In some aspects, a polypeptide of the invention can have an alternativeenzymatic activity. For example, the polypeptide can haveendoglucanase/cellulase activity; xylanase activity; protease activity;etc.; in other words, enzymes of the invention can be multi-functionalin that they have relaxed substrate specificities. In fact, studiesshown herein demonstrate that two exemplary oligomerases of thisinvention enzymes are multi-functional in that they have relaxedsubstrate specificities, see discussion above.

“Amino acid” or “amino acid sequence” as used herein refer to anoligopeptide, peptide, polypeptide, or protein sequence, or to afragment, portion, or subunit of any of these and to naturally occurringor synthetic molecules. “Amino acid” or “amino acid sequence” include anoligopeptide, peptide, polypeptide, or protein sequence, or to afragment, portion, or subunit of any of these, and to naturallyoccurring or synthetic molecules. The term “polypeptide” as used herein,refers to amino acids joined to each other by peptide bonds or modifiedpeptide bonds, i.e., peptide isosteres and may contain modified aminoacids other than the 20 gene-encoded amino acids. The polypeptides maybe modified by either natural processes, such as post-translationalprocessing, or by chemical modification techniques which are well knownin the art. Modifications can occur anywhere in the polypeptide,including the peptide backbone, the amino acid side-chains and the aminoor carboxyl termini. It will be appreciated that the same type ofmodification may be present in the same or varying degrees at severalsites in a given polypeptide. Also a given polypeptide may have manytypes of modifications. Modifications include acetylation, acylation,ADP-ribosylation, amidation, covalent attachment of flavin, covalentattachment of a heme moiety, covalent attachment of a nucleotide ornucleotide derivative, covalent attachment of a lipid or lipidderivative, covalent attachment of a phosphatidylinositol, cross-linkingcyclization, disulfide bond formation, demethylation, formation ofcovalent cross-links, formation of cysteine, formation of pyroglutamate,formylation, gamma-carboxylation, glycosylation, GPI anchor formation,hydroxylation, iodination, methylation, myristolyation, oxidation,pegylation, glucan hydrolase processing, phosphorylation, prenylation,racemization, selenoylation, sulfation and transfer-RNA mediatedaddition of amino acids to protein such as arginylation. (See Creighton,T. E., Proteins—Structure and Molecular Properties 2nd Ed., W.H. Freemanand Company, New York (1993); Posttranslational Covalent Modification ofProteins, B. C. Johnson, Ed., Academic Press, New York, pp. 1-12(1983)). The peptides and polypeptides of the invention also include all“mimetic” and “peptidomimetic” forms, as described in further detail,below.

As used herein, the term “isolated” means that the material (e.g., aprotein or nucleic acid of the invention) is removed from its originalenvironment (e.g., the natural environment if it is naturallyoccurring). For example, a naturally-occurring polynucleotide orpolypeptide present in a living animal is not isolated, but the samepolynucleotide or polypeptide, separated from some or all of thecoexisting materials in the natural system, is isolated. Suchpolynucleotides could be part of a vector and/or such polynucleotides orpolypeptides could be part of a composition and still be isolated inthat such vector or composition is not part of its natural environment.As used herein, the term “purified” does not require absolute purity;rather, it is intended as a relative definition. Individual nucleicacids obtained from a library have been conventionally purified toelectrophoretic homogeneity. The sequences obtained from these clonescould not be obtained directly either from the library or from totalhuman DNA. The purified nucleic acids of the invention have beenpurified from the remainder of the genomic DNA in the organism by atleast 10⁴-10⁶ fold. In one aspect, the term “purified” includes nucleicacids which have been purified from the remainder of the genomic DNA orfrom other sequences in a library or other environment by at least oneorder of magnitude, e.g., in one aspect, two or three orders, or, fouror five orders of magnitude.

“Recombinant” polypeptides or proteins refer to polypeptides or proteinsproduced by recombinant DNA techniques; i.e., produced from cellstransformed by an exogenous DNA construct encoding the desiredpolypeptide or protein. “Synthetic” polypeptides or protein are thoseprepared by chemical synthesis. Solid-phase chemical peptide synthesismethods can also be used to synthesize the polypeptide or fragments ofthe invention. Such method have been known in the art since the early1960's (Merrifield, R. B., J. Am. Chem. Soc., 85:2149-2154, 1963) (Seealso Stewart, J. M. and Young, J. D., Solid Phase Peptide Synthesis, 2ndEd., Pierce Chemical Co., Rockford, Ill., pp. 11-12)) and have recentlybeen employed in commercially available laboratory peptide design andsynthesis kits (Cambridge Research Biochemicals). Such commerciallyavailable laboratory kits have generally utilized the teachings of H. M.Geysen et al, Proc. Natl. Acad. Sci., USA, 81:3998 (1984) and providefor synthesizing peptides upon the tips of a multitude of “rods” or“pins” all of which are connected to a single plate.

The phrase “substantially identical” in the context of two nucleic acidsor polypeptides, refers to two or more sequences that have, e.g., atleast about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%,62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%,76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%,90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more nucleotide oramino acid residue (sequence) identity, when compared and aligned formaximum correspondence, as measured using one of the known sequencecomparison algorithms or by visual inspection. In alternative aspects,the substantial identity exists over a region of at least about 100 ormore residues and most commonly the sequences are substantiallyidentical over at least about 150 to 200 or more residues. In someaspects, the sequences are substantially identical over the entirelength of the coding regions.

Additionally a “substantially identical” amino acid sequence is asequence that differs from a reference sequence by one or moreconservative or non-conservative amino acid substitutions, deletions, orinsertions. In one aspect, the substitution occurs at a site that is notthe active site of the molecule, or, alternatively the substitutionoccurs at a site that is the active site of the molecule, provided thatthe polypeptide essentially retains its functional (enzymatic)properties. A conservative amino acid substitution, for example,substitutes one amino acid for another of the same class (e.g.,substitution of one hydrophobic amino acid, such as isoleucine, valine,leucine, or methionine, for another, or substitution of one polar aminoacid for another, such as substitution of arginine for lysine, glutamicacid for aspartic acid or glutamine for asparagine). One or more aminoacids can be deleted, for example, from a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase polypeptide,resulting in modification of the structure of the polypeptide, withoutsignificantly altering its biological activity. For example, amino- orcarboxyl-terminal amino acids that are not required for cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme biologicalactivity can be removed. Modified polypeptide sequences of the inventioncan be assayed for cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme biological activity by any number of methods,including contacting the modified polypeptide sequence with a substrateand determining whether the modified polypeptide decreases the amount ofspecific substrate in the assay or increases the bioproducts of theenzymatic reaction of a functional cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase polypeptide with the substrate.

“Fragments” as used herein are a portion of a naturally occurringprotein which can exist in at least two different conformations.Fragments can have the same or substantially the same amino acidsequence as the naturally occurring protein. Fragments which havedifferent three dimensional structures as the naturally occurringprotein are also included. An example of this, is a “pro-form” molecule,such as a low activity proprotein that can be modified by cleavage toproduce a mature enzyme with significantly higher activity.

In one aspect, the invention provides crystal (three-dimensional)structures of proteins and peptides, e.g., cellulases, of the invention;which can be made and analyzed using the routine protocols well known inthe art, e.g., as described in MacKenzie (1998) Crystal structure of thefamily 7 endoglucanase I (Cel7B) from Humicola insolens at 2.2 Aresolution and identification of the catalytic nucleophile by trappingof the covalent glycosyl-enzyme intermediate, Biochem. J. 335:409-416;Sakon (1997) Structure and mechanism of endo/exocellulase E4 fromThermomonospora fusca, Nat. Struct. Biol 4:810-818; Varrot (1999)Crystal structure of the catalytic core domain of the family 6cellobiohydrolase II, Cel6A, from Humicola insolens, at 1.92 Aresolution, Biochem. J. 337:297-304; illustrating and identifyingspecific structural elements as guidance for the routine generation ofcellulase variants of the invention, and as guidance for identifyingenzyme species within the scope of the invention.

Polypeptides and peptides of the invention can be isolated from naturalsources, be synthetic, or be recombinantly generated polypeptides.Peptides and proteins can be recombinantly expressed in vitro or invivo. The peptides and polypeptides of the invention can be made andisolated using any method known in the art. Polypeptide and peptides ofthe invention can also be synthesized, whole or in part, using chemicalmethods well known in the art. See e.g., Caruthers (1980) Nucleic AcidsRes. Symp. Ser. 215-223; Horn (1980) Nucleic Acids Res. Symp. Ser.225-232; Banga, A. K., Therapeutic Peptides and Proteins, Formulation,Processing and Delivery Systems (1995) Technomic Publishing Co.,Lancaster, Pa. For example, peptide synthesis can be performed usingvarious solid-phase techniques (see e.g., Roberge (1995) Science269:202; Merrifield (1997) Methods Enzymol. 289:3-13) and automatedsynthesis may be achieved, e.g., using the ABI 431A Peptide Synthesizer(Perkin Elmer) in accordance with the instructions provided by themanufacturer.

The peptides and polypeptides of the invention can also be glycosylated.The glycosylation can be added post-translationally either chemically orby cellular biosynthetic mechanisms, wherein the later incorporates theuse of known glycosylation motifs, which can be native to the sequenceor can be added as a peptide or added in the nucleic acid codingsequence. The glycosylation can be O-linked or N-linked.

The peptides and polypeptides of the invention, as defined above,include all “mimetic” and “peptidomimetic” forms. The terms “mimetic”and “peptidomimetic” refer to a synthetic chemical compound which hassubstantially the same structural and/or functional characteristics ofthe polypeptides of the invention. The mimetic can be either entirelycomposed of synthetic, non-natural analogues of amino acids, or, is achimeric molecule of partly natural peptide amino acids and partlynon-natural analogs of amino acids. The mimetic can also incorporate anyamount of natural amino acid conservative substitutions as long as suchsubstitutions also do not substantially alter the mimetic's structureand/or activity. As with polypeptides of the invention which areconservative variants or members of a genus of polypeptides of theinvention (e.g., having about 50% or more sequence identity to anexemplary sequence of the invention), routine experimentation willdetermine whether a mimetic is within the scope of the invention, i.e.,that its structure and/or function is not substantially altered. Thus,in one aspect, a mimetic composition is within the scope of theinvention if it has a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes activity.

Polypeptide mimetic compositions of the invention can contain anycombination of non-natural structural components. In alternative aspect,mimetic compositions of the invention include one or all of thefollowing three structural groups: a) residue linkage groups other thanthe natural amide bond (“peptide bond”) linkages; b) non-naturalresidues in place of naturally occurring amino acid residues; or c)residues which induce secondary structural mimicry, i.e., to induce orstabilize a secondary structure, e.g., a beta turn, gamma turn, betasheet, alpha helix conformation, and the like. For example, apolypeptide of the invention can be characterized as a mimetic when allor some of its residues are joined by chemical means other than naturalpeptide bonds. Individual peptidomimetic residues can be joined bypeptide bonds, other chemical bonds or coupling means, such as, e.g.,glutaraldehyde, N-hydroxysuccinimide esters, bifunctional maleimides,N,N′-dicyclohexylcarbodiimide (DCC) or N,N′-diisopropylcarbodiimide(DIC). Linking groups that can be an alternative to the traditionalamide bond (“peptide bond”) linkages include, e.g., ketomethylene (e.g.,—C(═O)—CH₂— for —C(═O)—NH—), aminomethylene (CH₂—NH), ethylene, olefin(CH═CH), ether (CH₂—O), thioether (CH₂—S), tetrazole (CN₄—), thiazole,retroamide, thioamide, or ester (see, e.g., Spatola (1983) in Chemistryand Biochemistry of Amino Acids, Peptides and Proteins, Vol. 7, pp267-357, “Peptide Backbone Modifications,” Marcell Dekker, NY).

A polypeptide of the invention can also be characterized as a mimetic bycontaining all or some non-natural residues in place of naturallyoccurring amino acid residues. Non-natural residues are well describedin the scientific and patent literature; a few exemplary non-naturalcompositions useful as mimetics of natural amino acid residues andguidelines are described below. Mimetics of aromatic amino acids can begenerated by replacing by, e.g., D- or L-naphylalanine; D- orL-phenylglycine; D- or L-2 thieneylalanine; D- or L-1, -2, 3-, or4-pyreneylalanine; D- or L-3 thieneylalanine; D- orL-(2-pyridinyl)-alanine; D- or L-(3-pyridinyl)-alanine; D- orL-(2-pyrazinyl)-alanine; D- or L-(4-isopropyl)-phenylglycine;D-(trifluoromethyl)-phenylglycine; D-(trifluoromethyl)-phenylalanine;D-p-fluoro-phenylalanine; D- or L-p-biphenylphenylalanine; D- orL-p-methoxy-biphenylphenylalanine; D- or L-2-indole(alkyl)alanines; and,D- or L-alkylainines, where alkyl can be substituted or unsubstitutedmethyl, ethyl, propyl, hexyl, butyl, pentyl, isopropyl, iso-butyl,sec-isotyl, iso-pentyl, or a non-acidic amino acids. Aromatic rings of anon-natural amino acid include, e.g., thiazolyl, thiophenyl, pyrazolyl,benzimidazolyl, naphthyl, furanyl, pyrrolyl, and pyridyl aromatic rings.

Mimetics of acidic amino acids can be generated by substitution by,e.g., non-carboxylate amino acids while maintaining a negative charge;(phosphono)alanine; sulfated threonine. Carboxyl side groups (e.g.,aspartyl or glutamyl) can also be selectively modified by reaction withcarbodiimides (R′—N—C—N—R′) such as, e.g.,1-cyclohexyl-3(2-morpholinyl-(4-ethyl) carbodiimide or1-ethyl-3(4-azonia-4,4-dimethylpentyl) carbodiimide. Aspartyl orglutamyl can also be converted to asparaginyl and glutaminyl residues byreaction with ammonium ions. Mimetics of basic amino acids can begenerated by substitution with, e.g., (in addition to lysine andarginine) the amino acids ornithine, citrulline, or (guanidino)-aceticacid, or (guanidino)alkyl-acetic acid, where alkyl is defined above.Nitrile derivative (e.g., containing the CN-moiety in place of COOH) canbe substituted for asparagine or glutamine. Asparaginyl and glutaminylresidues can be deaminated to the corresponding aspartyl or glutamylresidues. Arginine residue mimetics can be generated by reacting arginylwith, e.g., one or more conventional reagents, including, e.g.,phenylglyoxal, 2,3-butanedione, 1,2-cyclo-hexanedione, or ninhydrin, inone aspect under alkaline conditions. Tyrosine residue mimetics can begenerated by reacting tyrosyl with, e.g., aromatic diazonium compoundsor tetranitromethane. N-acetylimidizol and tetranitromethane can be usedto form O-acetyl tyrosyl species and 3-nitro derivatives, respectively.Cysteine residue mimetics can be generated by reacting cysteinylresidues with, e.g., alpha-haloacetates such as 2-chloroacetic acid orchloroacetamide and corresponding amines; to give carboxymethyl orcarboxyamidomethyl derivatives. Cysteine residue mimetics can also begenerated by reacting cysteinyl residues with, e.g.,bromo-trifluoroacetone, alpha-bromo-beta-(5-imidozoyl) propionic acid;chloroacetyl phosphate, N-alkylmaleimides, 3-nitro-2-pyridyl disulfide;methyl 2-pyridyl disulfide; p-chloromercuribenzoate; 2-chloromercuri-4nitrophenol; or, chloro-7-nitrobenzo-oxa-1,3-diazole. Lysine mimeticscan be generated (and amino terminal residues can be altered) byreacting lysinyl with, e.g., succinic or other carboxylic acidanhydrides. Lysine and other alpha-amino-containing residue mimetics canalso be generated by reaction with imidoesters, such as methylpicolinimidate, pyridoxal phosphate, pyridoxal, chloroborohydride,trinitro-benzenesulfonic acid, O-methylisourea, 2,4, pentanedione, andtransamidase-catalyzed reactions with glyoxylate. Mimetics of methioninecan be generated by reaction with, e.g., methionine sulfoxide. Mimeticsof proline include, e.g., pipecolic acid, thiazolidine carboxylic acid,3- or 4-hydroxy proline, dehydroproline, 3- or 4-methylproline, or3,3,-dimethylproline. Histidine residue mimetics can be generated byreacting histidyl with, e.g., diethylprocarbonate or para-bromophenacylbromide. Other mimetics include, e.g., those generated by hydroxylationof proline and lysine; phosphorylation of the hydroxyl groups of serylor threonyl residues; methylation of the alpha-amino groups of lysine,arginine and histidine; acetylation of the N-terminal amine; methylationof main chain amide residues or substitution with N-methyl amino acids;or amidation of C-terminal carboxyl groups.

In one aspect, a residue, e.g., an amino acid, of a polypeptide of theinvention can also be replaced by an amino acid (or peptidomimeticresidue) of the opposite chirality. In one aspect, any amino acidnaturally occurring in the L-configuration (which can also be referredto as the R or S, depending upon the structure of the chemical entity)can be replaced with the amino acid of the same chemical structural typeor a peptidomimetic, but of the opposite chirality, referred to as theD-amino acid, but also can be referred to as the R- or S-form.

The invention also provides methods for modifying the polypeptides ofthe invention by either natural processes, such as post-translationalprocessing (e.g., phosphorylation, acylation, etc), or by chemicalmodification techniques, and the resulting modified polypeptides.Modifications can occur anywhere in the polypeptide, including thepeptide backbone, the amino acid side-chains and the amino or carboxyltermini. It will be appreciated that the same type of modification maybe present in the same or varying degrees at several sites in a givenpolypeptide. Also a given polypeptide may have many types ofmodifications. In one aspect, modifications include acetylation,acylation, ADP-ribosylation, amidation, covalent attachment of flavin,covalent attachment of a heme moiety, covalent attachment of anucleotide or nucleotide derivative, covalent attachment of a lipid orlipid derivative, covalent attachment of a phosphatidylinositol,cross-linking cyclization, disulfide bond formation, demethylation,formation of covalent cross-links, formation of cysteine, formation ofpyroglutamate, formylation, gamma-carboxylation, glycosylation, GPIanchor formation, hydroxylation, iodination, methylation,myristolyation, oxidation, pegylation, proteolytic processing,phosphorylation, prenylation, racemization, selenoylation, sulfation,and transfer-RNA mediated addition of amino acids to protein such asarginylation. See, e.g., Creighton, T. E., Proteins—Structure andMolecular Properties 2nd Ed., W.H. Freeman and Company, New York (1993);Posttranslational Covalent Modification of Proteins, B. C. Johnson, Ed.,Academic Press, New York, pp. 1-12 (1983).

Solid-phase chemical peptide synthesis methods can also be used tosynthesize the polypeptide or fragments of the invention. Such methodhave been known in the art since the early 1960's (Merrifield, R. B., J.Am. Chem. Soc., 85:2149-2154, 1963) (See also Stewart, J. M. and Young,J. D., Solid Phase Peptide Synthesis, 2nd Ed., Pierce Chemical Co.,Rockford, Ill., pp. 11-12)) and have recently been employed incommercially available laboratory peptide design and synthesis kits(Cambridge Research Biochemicals). Such commercially availablelaboratory kits have generally utilized the teachings of H. M. Geysen etal, Proc. Natl. Acad. Sci., USA, 81:3998 (1984) and provide forsynthesizing peptides upon the tips of a multitude of “rods” or “pins”all of which are connected to a single plate. When such a system isutilized, a plate of rods or pins is inverted and inserted into a secondplate of corresponding wells or reservoirs, which contain solutions forattaching or anchoring an appropriate amino acid to the pin's or rod'stips. By repeating such a process step, i.e., inverting and insertingthe rod's and pin's tips into appropriate solutions, amino acids arebuilt into desired peptides. In addition, a number of available FMOCpeptide synthesis systems are available. For example, assembly of apolypeptide or fragment can be carried out on a solid support using anApplied Biosystems, Inc. Model 431A™ automated peptide synthesizer. Suchequipment provides ready access to the peptides of the invention, eitherby direct synthesis or by synthesis of a series of fragments that can becoupled using other known techniques.

The polypeptides of the invention include cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes in anactive or inactive form. For example, the polypeptides of the inventioninclude proproteins before “maturation” or processing of preprosequences, e.g., by a proprotein-processing enzyme, such as a proproteinconvertase to generate an “active” mature protein. The polypeptides ofthe invention include cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes inactive for other reasons, e.g., before“activation” by a post-translational processing event, e.g., an endo- orexo-peptidase or proteinase action, a phosphorylation event, anamidation, a glycosylation or a sulfation, a dimerization event, and thelike. The polypeptides of the invention include all active forms,including active subsequences, e.g., catalytic domains or active sites,of the enzyme.

The invention includes immobilized cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes, anti-cellulase, e.g.,anti-endoglucanase, anti-cellobiohydrolase and/or anti-beta-glucosidaseantibodies and fragments thereof. The invention provides methods forinhibiting cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme activity, e.g., using dominant negativemutants or anti-cellulase, e.g., anti-endoglucanase,anti-cellobiohydrolase and/or anti-beta-glucosidase antibodies of theinvention. The invention includes heterocomplexes, e.g., fusionproteins, heterodimers, etc., comprising the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes of theinvention.

Polypeptides of the invention can have a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity under variousconditions, e.g., extremes in pH and/or temperature, oxidizing agents,and the like. The invention provides methods leading to alternativecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme preparations with different catalytic efficienciesand stabilities, e.g., towards temperature, oxidizing agents andchanging wash conditions. In one aspect, cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme variants can be producedusing techniques of site-directed mutagenesis and/or random mutagenesis.In one aspect, directed evolution can be used to produce a great varietyof cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme variants with alternative specificities andstability.

The proteins of the invention are also useful as research reagents toidentify cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme modulators, e.g., activators or inhibitors ofcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity. Briefly, test samples (compounds, broths,extracts, and the like) are added to cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme assays to determine theirability to inhibit substrate cleavage. Inhibitors identified in this waycan be used in industry and research to reduce or prevent undesiredproteolysis. As with cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes, inhibitors can be combined to increase thespectrum of activity.

The enzymes of the invention are also useful as research reagents todigest proteins or in protein sequencing. For example, the cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymesmay be used to break polypeptides into smaller fragments for sequencingusing, e.g., an automated sequencer.

The invention also provides methods of discovering new cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes using thenucleic acids, polypeptides and antibodies of the invention. In oneaspect, phagemid libraries are screened for expression-based discoveryof cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes. In another aspect, lambda phage libraries arescreened for expression-based discovery of cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes. Screeningof the phage or phagemid libraries can allow the detection of toxicclones; improved access to substrate; reduced need for engineering ahost, by-passing the potential for any bias resulting from mass excisionof the library; and, faster growth at low clone densities. Screening ofphage or phagemid libraries can be in liquid phase or in solid phase. Inone aspect, the invention provides screening in liquid phase. This givesa greater flexibility in assay conditions; additional substrateflexibility; higher sensitivity for weak clones; and ease of automationover solid phase screening.

The invention provides screening methods using the proteins and nucleicacids of the invention and robotic automation to enable the execution ofmany thousands of biocatalytic reactions and screening assays in a shortperiod of time, e.g., per day, as well as ensuring a high level ofaccuracy and reproducibility (see discussion of arrays, below). As aresult, a library of derivative compounds can be produced in a matter ofweeks. For further teachings on modification of molecules, includingsmall molecules, see PCT/US94/09174; U.S. Pat. No. 6,245,547.

In one aspect, polypeptides or fragments of the invention are obtainedthrough biochemical enrichment or purification procedures. The sequenceof potentially homologous polypeptides or fragments may be determined bycellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme assays (see, e.g., Examples 1, 2 and 3, below), gelelectrophoresis and/or microsequencing. The sequence of the prospectivepolypeptide or fragment of the invention can be compared to an exemplarypolypeptide of the invention, or a fragment, e.g., comprising at leastabout 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or moreconsecutive amino acids thereof using any of the programs describedabove.

Another aspect of the invention is an assay for identifying fragments orvariants of the invention, which retain the enzymatic function of thepolypeptides of the invention. For example the fragments or variants ofsaid polypeptides, may be used to catalyze biochemical reactions, whichindicate that the fragment or variant retains the enzymatic activity ofa polypeptide of the invention. An exemplary assay for determining iffragments of variants retain the enzymatic activity of the polypeptidesof the invention includes the steps of: contacting the polypeptidefragment or variant with a substrate molecule under conditions whichallow the polypeptide fragment or variant to function and detectingeither a decrease in the level of substrate or an increase in the levelof the specific reaction product of the reaction between the polypeptideand substrate.

The present invention exploits the unique catalytic properties ofenzymes. Whereas the use of biocatalysts (i.e., purified or crudeenzymes, non-living or living cells) in chemical transformationsnormally requires the identification of a particular biocatalyst thatreacts with a specific starting compound, the present invention usesselected biocatalysts and reaction conditions that are specific forfunctional groups that are present in many starting compounds, such assmall molecules. Each biocatalyst is specific for one functional group,or several related functional groups and can react with many startingcompounds containing this functional group.

In one aspect, the biocatalytic reactions produce a population ofderivatives from a single starting compound. These derivatives can besubjected to another round of biocatalytic reactions to produce a secondpopulation of derivative compounds. Thousands of variations of theoriginal small molecule or compound can be produced with each iterationof biocatalytic derivatization.

Enzymes react at specific sites of a starting compound without affectingthe rest of the molecule, a process which is very difficult to achieveusing traditional chemical methods. This high degree of biocatalyticspecificity provides the means to identify a single active compoundwithin the library. The library is characterized by the series ofbiocatalytic reactions used to produce it, a so-called “biosynthetichistory”. Screening the library for biological activities and tracingthe biosynthetic history identifies the specific reaction sequenceproducing the active compound. The reaction sequence is repeated and thestructure of the synthesized compound determined. This mode ofidentification, unlike other synthesis and screening approaches, doesnot require immobilization technologies and compounds can be synthesizedand tested free in solution using virtually any type of screening assay.It is important to note, that the high degree of specificity of enzymereactions on functional groups allows for the “tracking” of specificenzymatic reactions that make up the biocatalytically produced library.

In one aspect, procedural steps are performed using robotic automationenabling the execution of many thousands of biocatalytic reactionsand/or screening assays per day as well as ensuring a high level ofaccuracy and reproducibility. Robotic automation can also be used toscreen for cellulase activity to determine if a polypeptide is withinthe scope of the invention. As a result, in one aspect, a library ofderivative compounds can be produced in a matter of weeks which wouldtake years to produce using “traditional” chemical or enzymaticscreening methods.

In a particular aspect, the invention provides a method for modifyingsmall molecules, comprising contacting a polypeptide encoded by apolynucleotide described herein or enzymatically active fragmentsthereof with a small molecule to produce a Modified small molecule. Alibrary of modified small molecules is tested to determine if a modifiedsmall molecule is present within the library, which exhibits a desiredactivity. A specific biocatalytic reaction which produces the modifiedsmall molecule of desired activity is identified by systematicallyeliminating each of the biocatalytic reactions used to produce a portionof the library and then testing the small molecules produced in theportion of the library for the presence or absence of the modified smallmolecule with the desired activity. The specific biocatalytic reactionswhich produce the modified small molecule of desired activity isoptionally repeated. The biocatalytic reactions are conducted with agroup of biocatalysts that react with distinct structural moieties foundwithin the structure of a small molecule, each biocatalyst is specificfor one structural moiety or a group of related structural moieties; andeach biocatalyst reacts with many different small molecules whichcontain the distinct structural moiety.

Cellulase, e.g., Endoglucanase, Cellobiohydrolase and/orBeta-Glucosidase Enzyme Signal Sequences, Prepro and Catalytic Domains

The invention provides cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme signal sequences (e.g.,signal peptides (SPs)), prepro domains and catalytic domains (CDs). TheSPs, prepro domains and/or CDs of the invention can be isolated,synthetic or recombinant peptides or can be part of a fusion protein,e.g., as a heterologous domain in a chimeric protein. The inventionprovides nucleic acids encoding these catalytic domains (CDs), preprodomains and signal sequences (SPs, e.g., a peptide having a sequencecomprising/consisting of amino terminal residues of a polypeptide of theinvention).

The invention provides isolated, synthetic or recombinant signalsequences (e.g., signal peptides) consisting of or comprising thesequence of (a sequence as set forth in) residues 1 to 14, 1 to 15, 1 to16, 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to24, 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 to 38, 1 to 40, 1 to41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46, or 1 to 47, or more, ofa polypeptide of the invention, e.g., exemplary polypeptides of theinvention, see also Table 3, Examples 1 and 4, below, and SequenceListing.

In one aspect, the invention provides signal sequences comprising thefirst 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70 or more amino terminal residues of a polypeptide ofthe invention.

For example, Table 3, below, sets forth exemplary signal (leader)sequences of the invention, e.g., as in the polypeptide having thesequence of SEQ ID NO:2, encoded, e.g., by SEQ ID NO:1, has a signalsequence comprising (or consisting of) the amino terminal 18 residues ofSEQ ID NO:2, or, MYKQLALASLSLFGLVNA. Additional exemplary signalsequences are similarly set forth in Table 3 (these are exemplary signalsequences, and the invention is not limited to these exemplarysequences, for example, another signal sequence for SEQ ID NO:2 may beMYKQLALASLSLFGLVN, etc.). Table 3 also sets forth other informationregarding the exemplary sequences of the invention. For example, in thefirst row, SEQ ID NO:1, 2, represent the exemplary polypeptide of theinvention having a sequence as set forth in SEQ ID NO:2, encoded by,e.g., SEQ ID NO:1; this exemplary sequence has cellobiohydrolasesactivity; a signal sequence is predicted to be MYKQLALASLSLFGLVNA (aminoacids 1 through 18 at the amino terminal end); this exemplary sequencewas initially isolated from an environmental sample, therefore it isclassified as being from an unknown source; and the “EC” number for thiscellobiohydrolase enzyme is 3.2.1.91 (an EC number is the numberassigned to a type of enzyme according to a scheme of standardizedenzyme nomenclature developed by the Enzyme Commission of theNomenclature Committee of the International Union of Biochemistry andMolecular Biology, or IUBMB).

TABLE 3 Signalp  Predicted Predicted SEQ ID Cleavage Site   Signal ECNOs: Activity (AA = Amino Acid) Sequence Source Number 1, 2Cellobiohydrolase Probability: 0.999 MYKQLALASLSL Unknown 3.2.1.91AA1: 18 AA2: 19 FGLVNA 101, 102 Clostridium 3.2.1.8 thermocellum103, 104 Cochliobolus 3.2.1.55 heterostrophus ATCC 48331 105, 106Probability: 1.000 MKKIVSLVCVLV Clostridium 3.2.1.4 AA1: 25 AA2: 26MLVSILGSFSVV thermocellum A 107, 108 Unknown 3.2.1.4 109, 110Probability: 0.984 MSSFKASAINPR Unknown 3.2.1.4 AA1: 39 AA2: 40MAGALTRSLYA AGFSLAVSTLST QAYA 11, 12 CellobiohydrolaseProbability: 1.000 MQRTSAWALLL Unknown 3.2.1.91 AA1: 18 AA2: 19 LAQIATA111, 112 Probability: 0.984 MSSFKASAINPR Unknown 3.2.1.4 AA1: 39 AA2: 40MAGALTRSLYA AGFSLAVSTLST QAYA 113, 114 Probability: 0.985 MSSFKASAINPRUnknown 3.2.1.4 AA1: 39 AA2: 40 MAGTLTRSLYAA GFSLAVSTLSTQ AYA 115, 116Probability: 1.000 MRKIILKFCALM Unknown 3.2.1.4 AA1: 29 AA2: 30MVVILIVSILQILP VFA 117, 118 Probability: 1.000 MKKRQGFIKKGL Unknown3.2.1.8 AA1: 34 AA2: 35 VLGVSLLLLALI MMSATSQTSA 119, 120Probability: 1.000 MSRNIRKSSFIFS Unknown 3.2.1.91 AA1: 33 AA2: 34LLTIIVLIASMFLQ TQTAQA 121, 122 Probability: 1.000 MLIRLAAAGALL Unknown3.2.1.8 AA1: 28 AA2: 29 LGAVFVAVSPAA AATA 123, 124 Probability: 1.000MRLKTLATATAA Unknown 3.2.1.91 AA1: 29 AA2: 30 AAVVAGTAVLW PGSASA125, 126 Probability: 0.999 MFKRNTVRVGCF Unknown 3.2.1.4 AA1: 32 AA2: 33IAVTAICAMLLFH VPSVVSA 127, 128 Probability: 1.000 MSGEPHVSLRLS Unknown3.2.1.8 AA1: 42 AA2: 43 RPRRRTAILAAV AACTVTAGAWL ATGTASA 129, 130Probability: 1.000 MKRSLCLVLVLL Unknown 3.2.1.4 AA1: 23 AA2: 24VILSACVQNYA 13, 14 B-glucosidase Unknown 3.2.1.21 131, 132Probability: 0.994 MMKGFRWCVM Unknown 3.2.1.4 AA1: 22 AA2: 23AMVVMAATNVR A 133, 134 Probability: 1.000 MPTGLRAKPCLT Unknown 3.2.1.4AA1: 35 AA2: 36 RWLAASACALAP LLLGAPASALA 135, 136 Probability: 0.999MSMITPKTKSYG Unknown 3.2.1.4 AA1: 24 AA2: 25 LAAMLSLGLAVA 137, 138Probability: 0.999 MKFPLQFLPLIFF Unknown 3.2.1.4 AA1: 33 AA2: 34RVLRLCLIPLLVF TSNFSVA 139, 140 Probability: 0.961 MPLCTTKHLLPA UnknownAA1: 34 AA2: 35 LVLVTASSFMLG CGNSTTPAEK 141, 142 Probability: 1.000MNNKLPGRSAN Unknown 3.2.1.4 AA1: 41 AA2: 42 GKRHSSPRKSILN SVIGILAGSLLSGLALA 143, 144 Clostridium thermocellum 145, 146 Clostridium 3.2.1.4thermocellum 147, 148 β-glucosidase Unknown 149, 150 β-glucosidaseUnknown 3.2.1.23 15, 16 B-glucosidase Probability: 1.000 MKIRSLLLLISILLUnknown 3.2.1.21 AA1: 22 AA2: 23 GVVSPGFG 151, 152 Probability: 0.973MRNFFKVFTLVL Unknown 3.2.1.4 AA1: 21 AA2: 22 VVISVMLFG 153, 154Probability: 0.997 MKRRNWNYLLII Unknown 3.2.1.4 AA1: 25 AA2: 26LLVISAFTLISAQ 155, 156 Probability: 1.000 MKRVAIILSVLTL Unknown 3.2.1.4AA1: 24 AA2: 25 LATIMSFPASA 157, 158 Probability: 1.000 MNVLRSGIVTMLUnknown 3.2.1.4 AA1: 21 AA2: 22 LLAAFSVQA 159, 160 Probability: 1.000MKRSISIFITCLLI Unknown 3.2.1.4 AA1: 29 AA2: 30 TLLTMGGMIASP ASA 161, 162Probability: 1.000 MRGQKLWVALA Unknown 3.2.1.4 AA1: 22 AA2: 23ALLSVGSVALA 163, 164 β-glucosidase Probability: 1.000 MSKIKHPKLLLCUnknown 3.2.1.21 AA1: 27 AA2: 28 LLPVFGMFNCQN VLS 165, 166Probability: 0.937 MSKAKLYVILLF Unknown 3.2.1.4 AA1: 17 AA2: 18 FSVVA167, 168 Probability: 1.000 MFIGFRLSIASLL Unknown AA1: 24 AA2: 25ACLLFPMQALA 169, 170 Probability: 1.000 MRCPSLLSIKNLL Unknown 3.2.1.4AA1: 26 AA2: 27 ALIGILLTAPVFA 17, 18 B-glucosidase Unknown 3.2.1.21171, 172 β-glucosidase Unknown 3.2.1.21 173, 174 Probability: 0.993MKSVLALALIVSI Unknown 3.2.1.4 AA1: 19 AA2: 20 NLVLLA 175, 176Probability: 0.924 MTFEKPIFERFRL Unknown 3.2.1.4 AA1: 36 AA2: 37PSCKPVILGLLSL ALAACGGVPG 177, 178 Probability: 1.000 MLVYRVSIQKHLUnknown 3.2.1.4 AA1: 46 AA2: 47 ASLTVLVSLLLIL AGCSSSSDSIAPV SSSSVSSA179, 180 β-glucosidase Probability: 1.000 MKKRIISAGLTFL Unknown 3.2.1.21AA1: 22 AA2: 23 IGVSLQAQS 181, 182 Probability: 1.000 MKRTILRFSKFLKUnknown 3.2.1.4 AA1: 29 AA2: 30 IVILITFTLQIFTVF A 183, 184Probability: 0.999 MREIILKSGALL Unknown 3.2.1.4 AA1: 29 AA2: 30MVVILIVSILQILT VFA 185, 186 Probability: 0.999 MREIILKSGALL Unknown3.2.1.4 AA1: 29 AA2: 30 MVVILIVSILQILT VFA 187, 188 β-glucosidaseThermosphaera 3.2.1.23 aggregans M11TL 189, 190 Probability: 0.997MKGFRWCVLAV Unknown 3.2.1.4 AA1: 22 AA2: 23 LMLAATNLRAA 19, 20Cellobiohydrolase Probability: 0.972 MLRYLSIVAATAI Unknown 3.2.1.91AA1: 19 AA2: 20 LTGVEA 191, 192 Probability: 1.000 MKKITRCCTLICA Unknown3.2.1.4 AA1: 24 AA2: 25 AIMLLNCSSSA 193, 194 Probability: 1.000MRKPACATLAV Unknown 3.2.1.4 AA1: 23 AA2: 24 MMSLLFTPFSQA 195, 196Probability: 1.000 MKRSISVFIACFM Unknown 3.2.1.4 AA1: 29 AA2: 30VAALGISGIIAPK AAA 197, 198 Unknown 3.2.1.4 199, 200 Probability: 0.979MKFFTVLLFFLSF Aquifex aeolicus 3.2.1.4 AA1: 16 AA2: 17 VFS 201, 202Probability: 0.981 MFPRLSPSRFRQ Unknown AA1: 29 AA2: 30 VTLTLLTLGLVSLTGCA 203, 204 Probability: 1.000 MWQRSKTLVLV Unknown 3.2.1.4AA1: 22 AA2: 23 LGLLLSHQAFA 205, 206 Probability: 1.000 MGTSLMIKSTLTBacteria 3.2.1.91 AA1: 30 AA2: 31 GMITAVAAAVFT TSAAFA 207, 208 Unknown3.2.1.4 209, 210 Probability: 1.000 MLVRLLIAMTVL Unknown 3.2.1.4AA1 : 19 AA2: 20 FSAFAHA 21, 22 Cellobiohydrolase Probability: 0.997MLTLAFLSLLAA Unknown AA1: 17 AA2: 18 ANAQK 211, 212 β-glucosidaseProbability: 1.000 MLTRRELIAATA Unknown 3.2.1.21 AA1: 23 AA2: 24LGLAASTKLVA 213, 214 Unknown 3.2.1.4 215, 216 Unknown 3.2.1.4 217, 218Probability: 0.995 MRFRKNFAVLM Unknown 3.2.1.4 AA1: 27 AA2: 28LIVLISTLFLSTQC KG 219, 220 Probability: 0.950 MDELESSCAFPM Unknown3.2.1.4 AA1: 30 AA2: 31 SRYLLLWVWVM LSSSAFA 221, 222 Probability: 0.990MSSVASLLSLTLL Unknown AA1: 17 AA2: 18 QAQA 223, 224 Probability: 1.000MMRTLVTSAFAC Unknown AA1: 24 AA2: 25 LLLPLGTGQADA 225, 226 β-glucosidaseProbability: 0.999 MNCTMKPMTRA Unknown 3.2.1.21 AA1: 29 AA2: 30VAGGLAALALA ACGSSDS 227, 228 β-glucosidase Probability: 1.000MTDRDVSRRALL Unknown 3.2.1.21 AA1: 25 AA2: 26 SLAAVAAATPAV A 229, 230β-glucosidase Probability: 1.000 MNRRELLASTLA Unknown 3.2.1.21AA1: 23 AA2: 24 FSAASALPAAA 23, 24 B-glucosidase Unknown 3.2.1.21231, 232 Unknown 233, 234 Unknown 3.2.1.78 235, 236 Probability: 0.994MRPVILAA1TMA Unknown 3.2.1.4 AA1: 21 AA2: 22 LSLFVSCSS 237, 238 Unknown3.2.1.78 239, 240 Unknown 3.2.1.4 241, 242 Probability: 1.000MNVLRSGLVTM Unknown 3.2.1.4 AA1: 21 AA2: 22 LLLAAFSVQA 243, 244Probability: 1.000 MKSKVKMFFAA Unknown 3.2.1.89 AA1: 24 AA2: 25AIVWSACSSTGY A 245, 246 Probability: 1.000 MGKISKYFAMFL Unknown 3.2.1.89AA1: 32 AA2: 33 AFLMVFSSLFVN FQPRNVQA 247, 248 Unknown 3.2.1.89 249, 250β-glucosidase Thermococcus 3.2.1.23 AEPII1a 25, 26 CellobiohydrolaseProbability: 1.000 MFSKTALLSSIFA Unknown AA1: 18 AA2: 19 AAATA 251, 252β-glucosidase Thermococcus 3.2.1.23 AEPII1a 253, 254 β-glucosidaseThermotoga 3.2.1.21 maritima MSB8 255, 256 Probability: 1.000MKVFRNSIIRKSV Unknown AA1: 30 AA2: 31 VLFCAVLWILPA GLSLA 257, 258Probability: 1.000 MKRSVSIFIACLV Unknown 3.2.1.4 AA1: 29 AA2: 30MTVLTISGVAAP EASA 259, 260 Unknown 3.2.1.4 261, 262 Probability: 0.892MSKKKFVIVSILT Pyrococcus 3.2.1.4 AA1: 19 AA2: 20 ILLVQA furiosus VC1263, 264 β-glucosidase Pyrococcus 3.2.1.23 furiosus VC1 265, 266β-glucosidase Bacteria 3.2.1.21 267, 268 Probability: 1.000 MNPRSLRRRTTABacteria 3.2.1.91 AA1: 30 AA2: 31 ALAALAACAALL ATQAQA 269, 270Probability: 1.000 MRRRIRALVAAL Bacteria AA1: 27 AA2: 28 SALPLALVVAPSAHA 27, 28 Cellobiohydrolase Probability: 0.997 MLLSAATLIAFA Unknown3.2.1.91 AA1: 22 AA2: 23 AGAIGAPAST 271, 272 β-glucosidase Bacteria3.2.1.21 273, 274 Probability: 0.879 MDAGDMNMKFE Unknown 3.2.1.4AA1: 34 AA2: 35 NRIGRFTRWCSL VAIVGVAPAFA 275, 276 Probability: 0.987MFGNNKTVRLT Unknown AA1: 34 AA2: 35 VVSGLTMLAAG CATAPCEQPVAA 277, 278Probability: 1.000 MKTKSIYSIAILSI Unknown 3.2.1.4 AA1: 26 AA2: 27ALFFFTTAQTFS 279, 280 Probability: 1.000 MKKLILTLFSLW Unknown 3.2.1.4AA1: 18 AA2: 19 AISAYA 281, 282 Unknown 3.2.1.91 283, 284Probability: 0.988 MRKIVKQINYLT Unknown 3.2.1.4 AA1: 31 AA2: 32PSVLGLLVLSLFF QVPTQA 285, 286 Probability: 1.000 MKKVSNARVLSF Unknown3.2.1.4 AA1: 28 AA2: 29 LLILVLIFGNLAS VFA 287, 288 Probability: 1.000MTRNWLGKILA Unknown 3.2.1.4 AA1: 24 AA2: 25 ALLLAGCAIPAP A 289, 290Unknown 3.2.1.4 29, 30 B-glucosidase Unknown 3.2.1.21 291, 292Probability: 0.998 MRRFRVVFLGLF Unknown AA1: 29 AA2: 30 VFFGIVIASQYGQTAAA 293, 294 Probability: 1.000 MKKIILKSGILLL Unknown 3.2.1.4AA1: 29 AA2: 30 VVILIVSILQILPV FA 295, 296 Probability: 0.983MKRTRYGVRSPR Unknown AA1: 33 AA2: 34 SAPRFGVLFGAA AAGVLMTGA 297, 298β-glucosidase Probability: 1.000 MRKLLTALLVTV Unknown 3.2.1.21AA1: 18 AA2: 19 AIGANA 299, 300 Probability: 1.000 MTSKHFFKITLM Unknown3.2.1.4 AA1: 22 AA2: 23 SILLFTTTLA 3, 4 B-glucosidase Probability: 1.000MLSNRRLIRTIPL Unknown 3.2:1.21 AA1: 33 AA2: 34 GAAAYSVLLGLA GCSQSTVA301, 302 Probability: 1.000 MFQSLKMRTLSF Unknown 3.2.1.4 AA1: 32 AA2: 33LLLMALLASFLA LPTDVAHA 303, 304 β-glucosidase Probability: 0.989MALSTVSKVMLL Unknown 3.2.1.21 AA1: 27 AA2: 28 TCAAVLLTIPGC NSA 305, 306Probability: 0.825 MAIGISATMLLA Unknown 3.2.1.4 AA1: 17 AA2: 18 MPQQA307, 308 Probability: 0.847 MSCRTLMSRRVG Unknown 3.2.1.4 AA1: 30 AA2: 31WGLLLWGGLFL RTGSVTG 309, 310 β-glucosidase Unknown 3.2.1.52 31, 32B-glucosidase Unknown 3.2.1.21 311, 312 β-glucosidase Unknown 3.2.1.21313, 314 Probability: 1.000 MKTKAVVLSLLL Unknown 3.2.1.4 AA1: 25 AA2: 26LLSMFGPMGAER A 315, 316 Clostridium 3.2.1.4 thermocellum 317, 318Unknown 3.2.1.4 319, 320 β-glucosidase Clostridium 3.2.1.21 thermocellum321, 322 β-glucosidase Unknown 3.2.1.21 323, 324 β-glucosidase Unknown3.2.1.21 325, 326 β-glucosidase Unknown 3.2.1.21 327, 328 β-glucosidaseUnknown 3.2.1.21 329, 330 β-glucosidase Unknown 3.2.1.21 33, 34Cellobiohydrolase Probability: 1.000 MYRILATASALL UnknownAA1: 20 AA2: 21 ATARAQQA 331, 332 β-glucosidase Probability: 1.000MNKILKLFSSLLL Unknown 3.2.1.4 AA1: 23 AA2: 24 FAGICPALQA 333, 334β-glucosidase Unknown 3.2.1.21 335, 336 β-glucosidase Unknown 3.2.1.21337, 338 Probability: 1.000 MSRGILILVMLSV Unknown 3.2.1.4AA1: 20 AA2: 21 LSGAALA 339, 340 β-glucosidase Unknown 3.2.1.21 341, 342β-glucosidase Unknown 3.2.1.21 343, 344 β-glucosidase Unknown 3.2.1.4345, 346 β-glucosidase Unknown 3.2.1.21 347, 348 β-glucosidase Unknown3.2.1.21 349, 350 β-glucosidase Unknown 3.2.1.21 35, 36Cellobiohydrolase Probability: 0.994 MYQKLAAISAFL UnknownAA1: 20 AA2: 21 AAARAQQV 351, 352 β-glucosidase Unknown 3.2.1.21353, 354 Probability: 1.000 MTRRSIVRSSSNK Unknown 3.2.1.91AA1: 29 AA2: 30 WLVLAGAALLA CTALG 355, 356 β-glucosidase Unknown3.2.1.21 357, 358 β-glucosidase Unknown 3.2.1.21 359, 360Probability: 0.999 MRNHLNVPFYFI Unknown 3.2.1.4 AA1: 29 AA2: 30FFFLIASIFTVCSS STA 361, 362 β-glucosidase Unknown 3.2.1.21 363, 364β-glucosidase Unknown 3.2.1.21 365, 366 β-glucosidase Unknown 3.2.1.21367, 368 β-glucosidase Unknown 3.2.1.21 369, 370 β-glucosidase Unknown37, 38 Endoglucanase Probability: 1.000 MPKKLLASFIALF Unknown 3.2.1.4AA1: 20 AA2: 21 FAANAAA 371, 372 Probability: 1.000 MSSKQKTVAIFVThermococcus 3.2.1.4 AA1: 29 AA2: 30 LFVALAGVAGSI AEPII1a PASYA 373, 374β-glucosidase Probability: 0.986 MNCTLKPMARV Unknown 3.2.1.21AA1 : 29 AA2: 30 VAGCVATLALA ACGSDTG 375, 376 β-glucosidaseProbability: 1.000 MSLFRPHPLKTA Unknown 3.2.1.21 AA1: 27 AA2: 28LATVLLGALTGQ ALA 377, 378 β-glucosidase Probability: 0.567 MTVEEKVNMVVUnknown 3.2.1.21 AA1: 29 AA2: 30 GGGMFVPGMQM PGAAAQA 379, 380β-glucosidase Probability: 1.000 MKKAFMILGAAL Unknown 3.2.1.21AA1: 19 AA2: 20 VTLGASA 381, 382 β-glucosidase Unknown 3.2.1.21 383, 384β-glucosidase Unknown 3.2.1.21 385, 386 β-glucosidase Unknown 3.2.1.21387, 388 β-glucosidase Bacillus sp. 3.2.1.21 G5308 389, 390β-glucosidase Unknown 3.2.1.21 39, 40 Endoglucanase Unknown 391, 392β-glucosidase Unknown 3.2.1.21 393, 394 β-glucosidase Unknown 3.2.1.21395, 396 β-glucosidase Probability: 1.000 MSHSKKLILTGSL Thermotoga3.2.1.21 AA1: 28 AA2: 29 SAVALCAMMLT maritima MSB8 PATA 397, 398β-glucosidase Unknown 3.2.1.21 399, 400 β-glucosidase Unknown 3.2.1.21401, 402 β-glucosidase Probability: 0.962 MNATLRISLILLI Unknown 3.2.1.21AA1: 19 AA2: 20 MVSGYA 403, 404 β-glucosidase Unknown 3.2.1.21 405, 406β-glucosidase Unknown 3.2.1.21 407, 408 β-glucosidase Unknown 3.2.1.21409, 410 β-glucosidase Unknown 3.2.1.21 41, 42 B-glucosidase Unknown3.2.1.21 411, 412 β-glucosidase Probability: 0.998 MSCFAKRFTPKL Unknown3.2.1.21 AA1: 26 AA2: 27 LTVLTTFIAMACF A 413, 414 β-glucosidaseProbability: 1.000 MKYLRPLSVFLC Unknown 3.2.1.21 AA1: 28 AA2: 29LVVVLALLLSTPP SSA  415, 416 Probability: 0.996 MLIIGGLLVLLGF Unknown3.2.1.4 AA1: 20 AA2: 21 SSCGRQA 417, 418 β-glucosidase Unknown 3.2.1.21419, 420 β-glucosidase Unknown 3.2.1.21 421, 422 β-glucosidase Unknown3.2.1.21 423, 424 β-glucosidase Probability: 1.000 MNHAARRRTLL Unknown3.2.1.21 AA1: 29 AA2: 30 GLGTALAGATLL PRGAAA 425, 426 β-glucosidaseUnknown 3.2.1.52 427, 428 Thermotoga 3.2.1.4 maritima MSB8 429, 430Thermotoga 3.2.1.4 maritima MSB8 43, 44 Cellobiohydrolase Unknown3.2.1.91 431, 432 β-glucosidase Unknown 3.2.1.21 433, 434PFAM:galactopyranose mutase Unknown 5.4.99.9 435, 436 β-glucosidaseUnknown 3.2.1.21 437, 438 β-glucosidase Probability: 0.979 MTTFNVSAVATAUnknown 3.2.1.21 AA1: 25 AA2: 26 PAPTASTTRPAA A 439, 440Probability: 1.000 MTHKTKSIASLSL Unknown 3.5.2.6 AA1: 25 AA2: 26ILMLLAVPLALA 441, 442 Thermotoga 3.2.1.139 maritima MSB8 443, 444Probability: 1.000 MNFSLRKAAAAL Unknown 3.2.1.8 AA1: 25 AA2: 26ACVAGLYASSAG A 445, 446 Probability: 0.981 MSAALSYRIYKNAgaricus bisporus ATCC 62489 AA1: 25 AA2: 26 ALLFTAFLTAAR A 447, 448Probability: 1.000 MIVGFSFMLLLPL Unknown 3:2.1.8 AA1: 20 AA2: 21 GMTNALA449, 450 Probability: 1.000 MRFPSIFTAVLFA Cochliobolus 3.2.1.91AA1: 19 AA2: 20 ASSALA heterostrophus ATCC 48331 45, 46Cellobiohydrolase Probability: 0.965 MSLLLTALSLVA UnknownAA1: 16 AA2: 17 AAKA 451, 452 Probability: 0.952 MYRVIATASALI UnknownAA1: 17 AA2: 18 ATARA 453, 454 Unknown 3.2.1.21 455, 456 Unknown3.2.1.55 457, 458 Unknown 3.2.1. 459, 460 Bacillus 3.2.1.55licheniformis 461, 462 Unknown 3.2.1.55 463, 464 Unknown 3.2.1.55465, 466 Bacillus 3.2.1.55 halodurans ATCC 467, 468 Thermotoga 3.2.1.55maritima MSB8 469, 470 Unknown 3.2.1.55 47, 48 EndoglucanaseProbability: 1.000 MRKNILMLAVA Unknown 3.2.1.4 AA1: 28 AA2: 29MIAAMCLTTSCG NKAQK 471, 472 Unknown 3.2.1.55 473, 474 Unknown 3.2.1.55475, 476 Unknown 3.2.1.55 477, 478 Unknown 3.2.1.55 479, 480Probability: 1.000 MKTFILAAAALG Unknown 3.2.1.37 AA1: 19 AA2: 20 VAMPGVA481, 482 Unknown 3.2.1.55 483, 484 Unknown 3.2.1.55 485, 486 Unknown3.2.1.55 487, 488 Unknown 3.2.1.55 489, 490 Unknown 3.2.1.55 49, 50B-glucosidase Probability: 0.999 MKTTKAVTLLA Unknown 3.2.1.21AA1: 24 AA2: 25 MGGALFALTAC NG 491, 492 Unknown 3.2.1.21 493, 494Unknown 3.2.1.21 495, 496 Unknown 497, 498 Unknown 3.2.1.37 499, 500Probability: 0.998 MKKRAFSFSLCV Unknown 3.2.1.21 AA1: 25 AA2: 26AIISTFWLPVAH M 5, 6 B-glucosidase Unknown 3.2.1.21 501, 502 Unknown503, 504 Unknown 505, 506 Unknown 507, 508 Probability: 0.993MQNRREFLQLLF Unknown 3.2.1. AA1: 27 AA2: 28 AGAGAGLVLPQI SFG 509, 510Probability: 0.926 MTTRREFIRDLL Unknown 3.2.1. AA1: 31 AA2: 32VGGVVVAVAPR FLAFSSVA 51, 52 Cellobiohydrolase Probability: 1.000MKGSISYQIYKG Unknown AA1: 27 AA2: 28 ALLLSSLLASVSA QG 511, 512Probability: 0.976 MINRRDFIKDLIIT Unknown 3.2.1. AA1: 27 AA2: 28SAGVAVLPQLAF G. 513, 514 Probability: 1.000 MSSRREFIRDLLT Unknown 3.2.1.AA1: 28 AA2: 29 GGALIAVAPRLS AFA 515, 516 Cochliobolus 3.2.1.21heterostrophus ATCC 48331 517, 518 Probability: 1.000 MTTTRRTILKAAUnknown 3.2.1. AA1: 34 AA2: 35 ASAGAIASTGWP ALAAAQAAQA 53, 54Cellobiohydrolase Probability: 0.995 MSALNSFNMYKS UnknownAA1: 23 AA2: 24 ALILGSLLATA 55, 56 Cellobiohydrolase Probability: 0.999MYRTLAIASSILA Unknown AA1: 20 AA2: 21 VAQGQLA 57, 58 B-glucosidaseProbability: 0.996 MTTRVGRCAQA Unknown 3.2.1.21 AA1: 30 AA2: 31KLLLGFCALALA SCQTATT 59, 60 Cellobiohydrolase Unknown 3.2.1.91 61, 62Endoglucanase Unknown 3.2.1.4 63, 64 CellobiohydrolaseProbability: 0.999 MKGLYTALVAS Unknown 3.2.1.91 AA1: 18 AA2: 19 AISGALA65, 66 Cellobiohydrolase Probability: 1.000 MAVKNILLAAAA Unknown3.2.1.91 AA1: 18 AA2: 19 LSASVA 67, 68 CellobiohydrolaseProbability: 0.999 MKSATLFALAAT Unknown 3.2.1.91 AA1: 15 AA2: 16 AQA69, 70 B-glucosidase Unknown 3.2.1.21 7, 8 B-glucosidaseProbability: 0.999 MNREVPTVSPRP Unknown 3.2.1.21 AA1: 28 AA2: 29LLVGMIAVLLAA PAAA 71, 72 Cellobiohydrolase Probability: 1.000MKTATLLALAAT Unknown 3.2.1.91 AA1: 15 AA2: 16 AQA 73, 74Cellobiohydrolase Probability: 0.999 MLSRTLFLASLLS UnknownAA1: 18 AA2: 19 TSLVA 75, 76 B-glucosidase  Unknown 3.2.1.21 77, 78Cellobiohydrolase Probability: 0.998 MYQRALLFSAL Unknown AA1: 19 AA2: 20MAGVSAQQ 79, 80 B-glucosidase Unknown 3.2.1.23 81, 82 CellobiohydrolaseProbability: 1.000 MQRTSAWALLL Unknown 3.2.1.91 AA1: 18 AA2: 19 LAQIATA83, 84 Cellobiohydrolase Probability: 1.000 MYRRAVLFSALA UnknownAA1: 17 AA2: 18 AAAHA 85, 86 Endoglucanase Probability: 1.000MRKNILMLAVA Unknown 3.2.1.4 AA1: 28 AA2: 29 MIAAMCVTTSCG NKAQK 87, 88Cellobiohydrolase Probability: 0.963 MLPLVLLSLLGA UnknownAA1: 15 AA2: 16 VTA 89, 90 B-glucosidase Unknown 3.2.1.21 9, 10Cellobiohydrolase  Probability: 0.995 MSALNSFNMYKS UnknownAA1: 23 AA2: 24 ALILGSLLATA 91, 92 Probability: 0.999 MTSGRNTCVCLLUnknown 3.2.1.55 AA1: 28 AA2: 29 LIVLAIGLLSKPP ASA 93, 94 B-glucosidaseBacteria 3.2.1.21 95, 96 Unknown 97, 98 Probability: 0.994 MRYTWSVAAALUnknown 3.2.1.91 AA1: 18 AA2: 19 LPCAIQA 99, 100 Probability: 1.000MISLKRVAALLC Unknown 3.2.1.8 AA1: 23 AA2: 24 VAGLGMSAANA 519, 520Oligomerase Cochliobolus 3.2.1.21 heterostrophus ATCC 48331 521, 522Oligomerase Cochliobolus 3.2.1.21 heterostrophus ATCC 48331 523, 524xylanase Probability: 1.000 MFMLSKKILMVL Unknown 3.2.1.8 AA1: 29 AA2: 30LTISMSFISLFTVT AYA

The invention includes polypeptides with or without a signal sequence(e.g., as described above and/or set forth in Table 3) and/or a preprosequence. The invention includes polypeptides with heterologous signalsequences and/or prepro sequences (for example, polypeptides of theinvention include enzymes where their endogenous signal (leader)sequence is replaced with a heterologous leader (signal) sequence foranother similar enzyme or from a completely different source). Theprepro sequence (including a sequence of the invention used as aheterologous prepro domain) can be located on the amino terminal or thecarboxy terminal end of the protein.

The invention also includes isolated, synthetic or recombinant signalsequences, prepro sequences and catalytic domains (e.g., “active sites”)comprising sequences of the invention. The polypeptide comprising asignal sequence of the invention can be a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention or another cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme or another enzyme or other polypeptide.Methods for identifying “prepro” domain sequences and signal sequencesare well known in the art, see, e.g., Van de Ven (1993) Crit. Rev.Oncog. 4(2):115-136. For example, to identify a prepro sequence, theprotein is purified from the extracellular space and the N-terminalprotein sequence is determined and compared to the unprocessed form.

The cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme signal sequences (SPs) and/or prepro sequences of theinvention can be isolated, synthetic or recombinant peptides, or,sequences joined to another cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme or a non-cellulase, e.g.,non-endoglucanase, non-cellobiohydrolase and/or non-beta-glucosidasepolypeptide, e.g., as a fusion (chimeric) protein. In one aspect, theinvention provides polypeptides comprising cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme signalsequences of the invention. In one aspect, polypeptides comprisingcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme signal sequences SPs and/or prepro of the inventioncomprise sequences heterologous to a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention (e.g., afusion protein comprising an SP and/or prepro of the invention andsequences from another cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme or a non-cellulase, e.g.,non-endoglucanase, non-cellobiohydrolase and/or non-beta-glucosidaseprotein). In one aspect, the invention provides cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes of theinvention with heterologous SPs and/or prepro sequences, e.g., sequenceswith a yeast signal sequence. An oligomerase or a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention can comprise a heterologous SP and/or prepro in a vector,e.g., a pPIC series vector (Invitrogen, Carlsbad, Calif.).

In one aspect, SPs and/or prepro sequences of the invention areidentified following identification of novel cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase polypeptides. Thepathways by which proteins are sorted and transported to their propercellular location are often referred to as protein targeting pathways.One of the most important elements in all of these targeting systems isa short amino acid sequence at the amino terminus of a newly synthesizedpolypeptide called the signal sequence. This signal sequence directs aprotein to its appropriate location in the cell and is removed duringtransport or when the protein reaches its final destination. Mostlysosomal, membrane, or secreted proteins have an amino-terminal signalsequence that marks them for translocation into the lumen of theendoplasmic reticulum. The signal sequences can vary in length fromabout 10 to 65, or more, amino acid residues. Various methods ofrecognition of signal sequences are known to those of skill in the art.For example, in one aspect, novel cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme signal peptides areidentified by a method referred to as SignalP. SignalP uses a combinedneural network which recognizes both signal peptides and their cleavagesites. (Nielsen (1997) “Identification of prokaryotic and eukaryoticsignal peptides and prediction of their cleavage sites.” ProteinEngineering 10:1-6.

In some aspects cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes of the invention do not have SPs and/orprepro sequences or “domains.” In one aspect, the invention provides thecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes of the invention lacking all or part of an SP and/ora prepro domain. In one aspect, the invention provides a nucleic acidsequence encoding a signal sequence (SP) and/or prepro from onecellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme operably linked to a nucleic acid sequence of adifferent cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme or, optionally, a signal sequence (SPs) and/orprepro domain from a non-cellulase, e.g., non-endoglucanase,non-cellobiohydrolase and/or non-beta-glucosidase protein may bedesired.

The invention also provides isolated, synthetic or recombinantpolypeptides comprising signal sequences (SPs), prepro domain and/orcatalytic domains (CDs) of the invention and heterologous sequences. Theheterologous sequences are sequences not naturally associated (e.g., toa enzyme) with an SP, prepro domain and/or CD. The sequence to which theSP, prepro domain and/or CD are not naturally associated can be on theSP's, prepro domain and/or CD's amino terminal end, carboxy terminalend, and/or on both ends of the SP and/or CD. In one aspect, theinvention provides an isolated, synthetic or recombinant polypeptidecomprising (or consisting of) a polypeptide comprising a signal sequence(SP), prepro domain and/or catalytic domain (CD) of the invention withthe proviso that it is not associated with any sequence to which it isnaturally associated (e.g., a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme sequence). Similarly inone aspect, the invention provides isolated, synthetic or recombinantnucleic acids encoding these polypeptides. Thus, in one aspect, theisolated, synthetic or recombinant nucleic acid of the inventioncomprises coding sequence for a signal sequence (SP), prepro domainand/or catalytic domain (CD) of the invention and a heterologoussequence (i.e., a sequence not naturally associated with the a signalsequence (SP), prepro domain and/or catalytic domain (CD) of theinvention). The heterologous sequence can be on the 3′ terminal end, 5′terminal end, and/or on both ends of the SP, prepro domain and/or CDcoding sequence.

Hybrid (Chimeric) Cellulase, e.g., Endoglucanase, Cellobiohydrolaseand/or Beta-Glucosidase Enzymes and Peptide Libraries

In one aspect, the invention provides hybrid cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes and fusionproteins, including peptide libraries, comprising sequences of theinvention. The peptide libraries of the invention can be used to isolatepeptide modulators (e.g., activators or inhibitors) of targets, such ascellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme substrates, receptors, enzymes. The peptide librariesof the invention can be used to identify formal binding partners oftargets, such as ligands, e.g., cytokines, hormones and the like. In oneaspect, the invention provides chimeric proteins comprising a signalsequence (SP), prepro domain and/or catalytic domain (CD) of theinvention or a combination thereof and a heterologous sequence (seeabove).

In one aspect, the fusion proteins of the invention (e.g., the peptidemoiety) are conformationally stabilized (relative to linear peptides) toallow a higher binding affinity for targets. The invention providesfusions of cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes of the invention and other peptides,including known and random peptides. They can be fused in such a mannerthat the structure of the cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes is not significantlyperturbed and the peptide is metabolically or structurallyconformationally stabilized. This allows the creation of a peptidelibrary that is easily monitored both for its presence within cells andits quantity.

Amino acid sequence variants of the invention can be characterized by apredetermined nature of the variation, a feature that sets them apartfrom a naturally occurring form, e.g., an allelic or interspeciesvariation of a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme sequence. In one aspect, the variants of theinvention exhibit the same qualitative biological activity as thenaturally occurring analogue. Alternatively, the variants can beselected for having modified characteristics. In one aspect, while thesite or region for introducing an amino acid sequence variation ispredetermined, the mutation per se need not be predetermined. Forexample, in order to optimize the performance of a mutation at a givensite, random mutagenesis may be conducted at the target codon or regionand the expressed cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme variants screened for the optimal combinationof desired activity. Techniques for making substitution mutations atpredetermined sites in DNA having a known sequence are well known, asdiscussed herein for example, M13 primer mutagenesis and PCRmutagenesis. Screening of the mutants can be done using, e.g., assays ofglucan hydrolysis. In alternative aspects, amino acid substitutions canbe single residues; insertions can be on the order of from about 1 to 20amino acids, although considerably larger insertions can be done.Deletions can range from about 1 to about 20, 30, 40, 50, 60, 70residues or more. To obtain a final derivative with the optimalproperties, substitutions, deletions, insertions or any combinationthereof may be used. Generally, these changes are done on a few aminoacids to minimize the alteration of the molecule. However, largerchanges may be tolerated in certain circumstances.

The invention provides cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes where the structure ofthe polypeptide backbone, the secondary or the tertiary structure, e.g.,an alpha-helical or beta-sheet structure, has been modified. In oneaspect, the charge or hydrophobicity has been modified. In one aspect,the bulk of a side chain has been modified. Substantial changes infunction or immunological identity are made by selecting substitutionsthat are less conservative. For example, substitutions can be made whichmore significantly affect: the structure of the polypeptide backbone inthe area of the alteration, for example a alpha-helical or a beta-sheetstructure; a charge or a hydrophobic site of the molecule, which can beat an active site; or a side chain. The invention provides substitutionsin polypeptide of the invention where (a) a hydrophilic residues, e.g.,seryl or threonyl, is substituted for (or by) a hydrophobic residue,e.g., leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteineor proline is substituted for (or by) any other residue; (c) a residuehaving an electropositive side chain, e.g., lysyl, arginyl, or histidyl,is substituted for (or by) an electronegative residue, e.g., glutamyl oraspartyl; or (d) a residue having a bulky side chain, e.g.,phenylalanine, is substituted for (or by) one not having a side chain,e.g., glycine. The variants can exhibit the same qualitative biologicalactivity (i.e., a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme activity) although variants can be selected tomodify the characteristics of the cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes as needed.

In one aspect, cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase-enzymes of the invention comprise epitopes orpurification tags, signal sequences or other fusion sequences, etc. Inone aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes of the invention can be fused to a randompeptide to form a fusion polypeptide. By “fused” or “operably linked”herein is meant that the random peptide and the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme are linkedtogether, in such a manner as to minimize the disruption to thestability of the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme structure, e.g., it retains cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activity.The fusion polypeptide (or fusion polynucleotide encoding the fusionpolypeptide) can comprise further components as well, including multiplepeptides at multiple loops.

In one aspect, the peptides and nucleic acids encoding them arerandomized, either fully randomized or they are biased in theirrandomization, e.g., in nucleotide/residue frequency generally or perposition. “Randomized” means that each nucleic acid and peptide consistsof essentially random nucleotides and amino acids, respectively. In oneaspect, the nucleic acids which give rise to the peptides can bechemically synthesized, and thus may incorporate any nucleotide at anyposition. Thus, when the nucleic acids are expressed to form peptides,any amino acid residue may be incorporated at any position. Thesynthetic process can be designed to generate randomized nucleic acids,to allow the formation of all or most of the possible combinations overthe length of the nucleic acid, thus forming a library of randomizednucleic acids. The library can provide a sufficiently structurallydiverse population of randomized expression products to affect aprobabilistically sufficient range of cellular responses to provide oneor more cells exhibiting a desired response. Thus, the inventionprovides an interaction library large enough so that at least one of itsmembers will have a structure that gives it affinity for some molecule,protein, or other factor.

In one aspect, a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme of the invention is a multidomain enzyme thatcomprises a signal peptide, a carbohydrate binding module, a cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymecatalytic domain, a linker and/or another catalytic domain.

The invention provides a methods and sequences for generating chimericpolypeptides which may encode biologically active hybrid polypeptides(e.g., hybrid cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes). In one aspect, the original polynucleotides(e.g., an exemplary nucleic acid of the invention) encode biologicallyactive polypeptides. In one aspect, a method of the invention producesnew hybrid polypeptides by utilizing cellular processes which integratethe sequence of the original polynucleotides such that the resultinghybrid polynucleotide encodes a polypeptide demonstrating activitiesderived, but different, from the original biologically activepolypeptides (e.g., cellulase or antibody of the invention). Forexample, the original polynucleotides may encode a particular enzyme(e.g., cellulase) from or found in different microorganisms. An enzymeencoded by a first polynucleotide from one organism or variant may, forexample, function effectively under a particular environmentalcondition, e.g., high salinity. An enzyme encoded by a secondpolynucleotide from a different organism or variant may functioneffectively under a different environmental condition, such as extremelyhigh temperatures. A hybrid polynucleotide containing sequences from thefirst and second original polynucleotides may encode an enzyme whichexhibits characteristics of both enzymes encoded by the originalpolynucleotides. Thus, the enzyme encoded by the hybrid polynucleotideof the invention may function effectively under environmental conditionsshared by each of the enzymes encoded by the first and secondpolynucleotides, e.g., high salinity and extreme temperatures.

In one aspect, a hybrid polypeptide generated by a method of theinvention may exhibit specialized enzyme activity not displayed in theoriginal enzymes. For example, following recombination and/or reductivereassortment of polynucleotides encoding cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes, the resulting hybridpolypeptide encoded by a hybrid polynucleotide can be screened forspecialized non-cellulase, e.g., non-endoglucanase,non-cellobiohydrolase and/or non-beta-glucosidase enzyme activities,e.g., hydrolase, peptidase, phosphorylase, etc., activities, obtainedfrom each of the original enzymes. In one aspect, the hybrid polypeptideis screened to ascertain those chemical functionalities whichdistinguish the hybrid polypeptide from the original parentpolypeptides, such as the temperature, pH or salt concentration at whichthe hybrid polypeptide functions.

In one aspect, the invention relates to a method for producing abiologically active hybrid polypeptide and screening such a polypeptidefor enhanced activity by:

-   -   1) introducing at least a first polynucleotide in operable        linkage and a second polynucleotide in operable linkage, the at        least first polynucleotide and second polynucleotide sharing at        least one region of partial sequence homology, into a suitable        host cell;    -   2) growing the host cell under conditions which promote sequence        reorganization resulting in a hybrid polynucleotide in operable        linkage;    -   3) expressing a hybrid polypeptide encoded by the hybrid        polynucleotide;    -   4) screening the hybrid polypeptide under conditions which        promote identification of enhanced biological activity; and    -   5) isolating the a polynucleotide encoding the hybrid        polypeptide.

Isolating and Discovering Cellulase Enzymes

The invention provides methods for isolating and discovering cellulases,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymesand the nucleic acids that encode them. Polynucleotides or enzymes maybe isolated from individual organisms (“isolates”), collections oforganisms that have been grown in defined media (“enrichment cultures”),or, uncultivated organisms (“environmental samples”). The organisms canbe isolated by, e.g., in vivo biopanning (see discussion, below). Theuse of a culture-independent approach to derive polynucleotides encodingnovel bioactivities from environmental samples is most preferable sinceit allows one to access untapped resources of biodiversity.Polynucleotides or enzymes also can be isolated from any one of numerousorganisms, e.g., bacteria. In addition to whole cells, polynucleotidesor enzymes also can be isolated from crude enzyme preparations derivedfrom cultures of these organisms, e.g., bacteria.

“Environmental libraries” are generated from environmental samples andrepresent the collective genomes of naturally occurring organismsarchived in cloning vectors that can be propagated in suitableprokaryotic hosts. Because the cloned DNA is initially extracteddirectly from environmental samples, the libraries are not limited tothe small fraction of prokaryotes that can be grown in pure culture.Additionally, a normalization of the environmental DNA present in thesesamples could allow more equal representation of the DNA from all of thespecies present in the original sample. This can dramatically increasethe efficiency of finding interesting genes from minor constituents ofthe sample which may be under-represented by several orders of magnitudecompared to the dominant species.

In one aspect, gene libraries generated from one or more uncultivatedmicroorganisms are screened for an activity of interest. Potentialpathways encoding bioactive molecules of interest are first captured inprokaryotic cells in the form of gene expression libraries. In oneaspect, polynucleotides encoding activities of interest are isolatedfrom such libraries and introduced into a host cell. The host cell isgrown under conditions which promote recombination and/or reductivereassortment creating potentially active biomolecules with novel orenhanced activities.

In vivo biopanning may be performed utilizing a FACS-based andnon-optical (e.g., magnetic) based machines. In one aspect, complex genelibraries are constructed with vectors which contain elements whichstabilize transcribed RNA. For example, the inclusion of sequences whichresult in secondary structures such as hairpins which are designed toflank the transcribed regions of the RNA would serve to enhance theirstability, thus increasing their half life within the cell. The probemolecules used in the biopanning process consist of oligonucleotideslabeled with reporter molecules that only fluoresce upon binding of theprobe to a target molecule. These probes are introduced into therecombinant cells from the library using one of several transformationmethods. The probe molecules bind to the transcribed target mRNAresulting in DNA/RNA heteroduplex molecules. Binding of the probe to atarget will yield a fluorescent signal which is detected and sorted bythe FACS machine during the screening process.

In one aspect, subcloning is performed to further isolate sequences ofinterest. In subcloning, a portion of DNA is amplified, digested,generally by restriction enzymes, to cut out the desired sequence, thedesired sequence is ligated into a recipient vector and is amplified. Ateach step in subcloning, the portion is examined for the activity ofinterest, in order to ensure that DNA that encodes the structuralprotein has not been excluded. The insert may be purified at any step ofthe subcloning, for example, by gel electrophoresis prior to ligationinto a vector or where cells containing the recipient vector and cellsnot containing the recipient vector are placed on selective mediacontaining, for example, an antibiotic, which will kill the cells notcontaining the recipient vector. Specific methods of subcloning cDNAinserts into vectors are well-known in the art (Sambrook et al.,Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring HarborLaboratory Press (1989)). In another aspect, the enzymes of theinvention are subclones. Such subclones may differ from the parent cloneby, for example, length, a mutation, a tag or a label.

The microorganisms from which the polynucleotide may be discovered,isolated or prepared include prokaryotic microorganisms, such asEubacteria and Archaebacteria and lower eukaryotic microorganisms suchas fungi, some algae and protozoa. Polynucleotides may be discovered,isolated or prepared from samples, e.g., environmental samples, in whichcase the nucleic acid may be recovered without culturing of an organismor recovered from one or more cultured organisms. In one aspect, suchmicroorganisms may be extremophiles, such as hyperthermophiles,psychrophiles, psychrotrophs, halophiles, barophiles and acidophiles.Polynucleotides encoding enzymes isolated from extremophilicmicroorganisms can be used. Enzymes of this invention can function attemperatures above 100° C., e.g., as those found in terrestrial hotsprings and deep sea thermal vents, or at temperatures below 0° C.,e.g., as those found in arctic waters, in a saturated salt environment,e.g., as those found in the Dead Sea, at pH values around 0, e.g., asthose found in coal deposits and geothermal sulfur-rich springs, or atpH values greater than 11, e.g., as those found in sewage sludge. In oneaspect, enzymes of the invention have high activity throughout a widerange of temperatures and pHs.

Polynucleotides selected and isolated as hereinabove described areintroduced into a suitable host cell. A suitable host cell is any cellwhich is capable of promoting recombination and/or reductivereassortment. The selected polynucleotides are in one aspect already ina vector which includes appropriate control sequences. The host cell canbe a higher eukaryotic cell, such as a mammalian cell, or a lowereukaryotic cell, such as a yeast cell, or in one aspect, the host cellcan be a prokaryotic cell, such as a bacterial cell. Introduction of theconstruct into the host cell can be effected by calcium phosphatetransfection, DEAE-Dextran mediated transfection, or electroporation.

Exemplary hosts include bacterial cells, such as E. coli, Streptomyces,Salmonella typhimurium; fungal cells, such as yeast; insect cells suchas Drosophila S2 and Spodoptera Sf9; animal cells such as CHO, COS orBowes melanoma; adenoviruses; and plant cells; see discussion, above.The selection of an appropriate host is deemed to be within the scope ofthose skilled in the art from the teachings herein.

Various mammalian cell culture systems can be employed to expressrecombinant protein; examples of mammalian expression systems includethe COS-7 lines of monkey kidney fibroblasts, described in“SV40-transformed simian cells support the replication of early SV40mutants” (Gluzman, 1981) and other cell lines capable of expressing acompatible vector, for example, the C127, 3T3, CHO, HeLa and BHK celllines. Mammalian expression vectors can comprise an origin ofreplication, a suitable promoter and enhancer and also any necessaryribosome binding sites, polyadenylation site, splice donor and acceptorsites, transcriptional termination sequences and 5′ flankingnontranscribed sequences. DNA sequences derived from the SV40 splice andpolyadenylation sites may be used to provide the required nontranscribedgenetic elements.

In another aspect, nucleic acids, polypeptides and methods of theinvention are used in biochemical pathways, or to generate novelpolynucleotides encoding biochemical pathways from one or more operonsor gene clusters or portions thereof. For example, bacteria and manyeukaryotes have a coordinated mechanism for regulating genes whoseproducts are involved in related processes. The genes are clustered, instructures referred to as “gene clusters,” on a single chromosome andare transcribed together under the control of a single regulatorysequence, including a single promoter which initiates transcription ofthe entire cluster. Thus, a gene cluster is a group of adjacent genesthat are either identical or related, usually as to their function (anexample of a biochemical pathway encoded by gene clusters arepolyketides).

In one aspect, gene cluster DNA is isolated from different organisms andligated into vectors, e.g., vectors containing expression regulatorysequences which can control and regulate the production of a detectableprotein or protein-related array activity from the ligated geneclusters. Use of vectors which have an exceptionally large capacity forexogenous DNA introduction can be appropriate for use with such geneclusters and are described by way of example herein to include thef-factor (or fertility factor) of E. coli. This f-factor of E. coli is aplasmid which affects high-frequency transfer of itself duringconjugation and is ideal to achieve and stably propagate large DNAfragments, such as gene clusters from mixed microbial samples. Oneaspect is to use cloning vectors, referred to as “fosmids” or bacterialartificial chromosome (BAC) vectors. These are derived from E. colif-factor which is able to stably integrate large segments of genomicDNA. When integrated with DNA from a mixed uncultured environmentalsample, this makes it possible to achieve large genomic fragments in theform of a stable “environmental DNA library.” Another type of vector foruse in the present invention is a cosmid vector. Cosmid vectors wereoriginally designed to clone and propagate large segments of genomicDNA. Cloning into cosmid vectors is described in detail in Sambrook etal., Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring HarborLaboratory Press (1989). Once ligated into an appropriate vector, two ormore vectors containing different polyketide synthase gene clusters canbe introduced into a suitable host cell. Regions of partial sequencehomology shared by the gene clusters will promote processes which resultin sequence reorganization resulting in a hybrid gene cluster. The novelhybrid gene cluster can then be screened for enhanced activities notfound in the original gene clusters.

Methods for screening for various enzyme activities are known to thoseof skill in the art and are discussed throughout the presentspecification, see, e.g., Examples 1, 2 and 3, below. Such methods maybe employed when isolating the polypeptides and polynucleotides of theinvention.

In one aspect, the invention provides methods for discovering andisolating cellulases, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase, or compounds to modify the activity of theseenzymes, using a whole cell approach (see discussion, below), clonesencoding cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase from genomic DNA library can be screened.

Screening Methodologies and “On-Line” Monitoring Devices

In practicing the methods of the invention, a variety of apparatus andmethodologies can be used to in conjunction with the polypeptides andnucleic acids of the invention, e.g., to screen polypeptides forcellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity, to screen compounds as potentialmodulators, e.g., activators or inhibitors, of a cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme activity,for antibodies that bind to a polypeptide of the invention, for nucleicacids that hybridize to a nucleic acid of the invention, to screen forcells expressing a polypeptide of the invention and the like. Inaddition to the array formats described in detail below for screeningsamples, alternative formats can also be used to practice the methods ofthe invention. Such formats include, for example, mass spectrometers,chromatographs, e.g., high-throughput HPLC and other forms of liquidchromatography, and smaller formats, such as 1536-well plates, 384-wellplates and so on. High throughput screening apparatus can be adapted andused to practice the methods of the invention, see, e.g., U.S. PatentApplication Nos. 20020001809; 20050272044.

Capillary Arrays

Nucleic acids or polypeptides of the invention can be immobilized to orapplied to an array. Arrays can be used to screen for or monitorlibraries of compositions (e.g., small molecules, antibodies, nucleicacids, etc.) for their ability to bind to or modulate the activity of anucleic acid or a polypeptide of the invention. Capillary arrays, suchas the GIGAMATRIX™, Diversa Corporation, San Diego, Calif.; and arraysdescribed in, e.g., U.S. Patent Application No. 20020080350 A1; WO0231203 A; WO 0244336 A, provide an alternative apparatus for holdingand screening samples. In one aspect, the capillary array includes aplurality of capillaries formed into an array of adjacent capillaries,wherein each capillary comprises at least one wall defining a lumen forretaining a sample. The lumen may be cylindrical, square, hexagonal orany other geometric shape so long as the walls form a lumen forretention of a liquid or sample. The capillaries of the capillary arraycan be held together in close proximity to form a planar structure. Thecapillaries can be bound together, by being fused (e.g., where thecapillaries are made of glass), glued, bonded, or clamped side-by-side.Additionally, the capillary array can include interstitial materialdisposed between adjacent capillaries in the array, thereby forming asolid planar device containing a plurality of through-holes.

A capillary array can be formed of any number of individual capillaries,for example, a range from 100 to 4,000,000 capillaries. Further, acapillary array having about 100,000 or more individual capillaries canbe formed into the standard size and shape of a Microtiter® plate forfitment into standard laboratory equipment. The lumens are filledmanually or automatically using either capillary action ormicroinjection using a thin needle. Samples of interest may subsequentlybe removed from individual capillaries for further analysis orcharacterization. For example, a thin, needle-like probe is positionedin fluid communication with a selected capillary to either add orwithdraw material from the lumen.

In a single-pot screening assay, the assay components are mixed yieldinga solution of interest, prior to insertion into the capillary array. Thelumen is filled by capillary action when at least a portion of the arrayis immersed into a solution of interest. Chemical or biologicalreactions and/or activity in each capillary are monitored for detectableevents. A detectable event is often referred to as a “hit”, which canusually be distinguished from “non-hit” producing capillaries by opticaldetection. Thus, capillary arrays allow for massively parallel detectionof “hits”.

In a multi-pot screening assay, a polypeptide or nucleic acid, e.g., aligand, can be introduced into a first component, which is introducedinto at least a portion of a capillary of a capillary array. An airbubble can then be introduced into the capillary behind the firstcomponent. A second component can then be introduced into the capillary,wherein the second component is separated from the first component bythe air bubble. The first and second components can then be mixed byapplying hydrostatic pressure to both sides of the capillary array tocollapse the bubble. The capillary array is then monitored for adetectable event resulting from reaction or non-reaction of the twocomponents.

In a binding screening assay, a sample of interest can be introduced asa first liquid labeled with a detectable particle into a capillary of acapillary array, wherein the lumen of the capillary is coated with abinding material for binding the detectable particle to the lumen. Thefirst liquid may then be removed from the capillary tube, wherein thebound detectable particle is maintained within the capillary, and asecond liquid may be introduced into the capillary tube. The capillaryis then monitored for a detectable event resulting from reaction ornon-reaction of the particle with the second liquid.

Arrays, or “Biochips”

Nucleic acids or polypeptides of the invention can be immobilized to orapplied to an array. Arrays can be used to screen for or monitorlibraries of compositions (e.g., small molecules, antibodies, nucleicacids, etc.) for their ability to bind to or modulate the activity of anucleic acid or a polypeptide of the invention. For example, in oneaspect of the invention, a monitored parameter is transcript expressionof a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme gene. One or more, or, all the transcripts ofa cell can be measured by hybridization of a sample comprisingtranscripts of the cell, or, nucleic acids representative of orcomplementary to transcripts of a cell, by hybridization to immobilizednucleic acids on an array, or “biochip.” By using an “array” of nucleicacids on a microchip, some or all of the transcripts of a cell can besimultaneously quantified. Alternatively, arrays comprising genomicnucleic acid can also be used to determine the genotype of a newlyengineered strain made by the methods of the invention. Polypeptidearrays” can also be used to simultaneously quantify a plurality ofproteins. The present invention can be practiced with any known “array,”also referred to as a “microarray” or “nucleic acid array” or“polypeptide array” or “antibody array” or “biochip,” or variationthereof. Arrays are generically a plurality of “spots” or “targetelements,” each target element comprising a defined amount of one ormore biological molecules, e.g., oligonucleotides, immobilized onto adefined area of a substrate surface for specific binding to a samplemolecule, e.g., mRNA transcripts.

The terms “array” or “microarray” or “biochip” or “chip” as used hereinis a plurality of target elements, each target element comprising adefined amount of one or more polypeptides (including antibodies) ornucleic acids immobilized onto a defined area of a substrate surface, asdiscussed in further detail, below.

In practicing the methods of the invention, any known array and/ormethod of making and using arrays can be incorporated in whole or inpart, or variations thereof, as described, for example, in U.S. Pat.Nos. 6,277,628; 6,277,489; 6,261,776; 6,258,606; 6,054,270; 6,048,695;6,045,996; 6,022,963; 6,013,440; 5,965,452; 5,959,098; 5,856,174;5,830,645; 5,770,456; 5,632,957; 5,556,752; 5,143,854; 5,807,522;5,800,992; 5,744,305; 5,700,637; 5,556,752; 5,434,049; see also, e.g.,WO 99/51773; WO 99/09217; WO 97/46313; WO 96/17958; see also, e.g.,Johnston (1998) Curr. Biol. 8:R171-R174; Schummer (1997) Biotechniques23:1087-1092; Kern (1997) Biotechniques 23:120-124; Solinas-Toldo (1997)Genes, Chromosomes & Cancer 20:399-407; Bowtell (1999) Nature GeneticsSupp. 21:25-32. See also published U.S. patent applications Nos.20010018642; 20010019827; 20010016322; 20010014449; 20010014448;20010012537; 20010008765.

Antibodies and Antibody-Based Screening Methods

The invention provides isolated, synthetic or recombinant antibodiesthat specifically bind to a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention. Theseantibodies can be used to isolate, identify or quantify the cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymesof the invention or related polypeptides. These antibodies can be usedto isolate other polypeptides within the scope the invention or otherrelated cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes. The antibodies can be designed to bind to anactive site of a cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme. Thus, the invention provides methods ofinhibiting cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes using the antibodies of the invention (seediscussion above regarding applications for anti-cellulase, e.g.,anti-endoglucanase, anti-cellobiohydrolase and/or anti-beta-glucosidaseenzyme compositions of the invention).

The term “antibody” includes a peptide or polypeptide derived from,modeled after or substantially encoded by an immunoglobulin gene orimmunoglobulin genes, or fragments thereof, capable of specificallybinding an antigen or epitope, see, e.g., Fundamental Immunology, ThirdEdition, W. E. Paul, ed., Raven Press, N.Y. (1993); Wilson (1994) J.Immunol. Methods 175:267-273; Yarmush (1992) J. Biochem. Biophys.Methods 25:85-97. The term antibody includes antigen-binding portions,i.e., “antigen binding sites,” (e.g., fragments, subsequences,complementarity determining regions (CDRs)) that retain capacity to bindantigen, including (i) a Fab fragment, a monovalent fragment consistingof the VL, VH, CL and CH1 domains; (ii) a F(ab′)2 fragment, a bivalentfragment comprising two Fab fragments linked by a disulfide bridge atthe hinge region; (iii) a Fd fragment consisting of the VH and CH1domains; (iv) a Fv fragment consisting of the VL and VH domains of asingle arm of an antibody, (v) a dAb fragment (Ward et al., (1989)Nature 341:544-546), which consists of a VH domain; and (vi) an isolatedcomplementarity determining region (CDR). Single chain antibodies arealso included by reference in the term “antibody.”

The invention provides fragments of the enzymes of the invention (e.g.,peptides) including immunogenic fragments (e.g., subsequences) of apolypeptide of the invention. The invention provides compositionscomprising a polypeptide or peptide of the invention and adjuvants orcarriers and the like.

The antibodies can be used in immunoprecipitation, staining,immunoaffinity columns, and the like. If desired, nucleic acid sequencesencoding for specific antigens can be generated by immunization followedby isolation of polypeptide or nucleic acid, amplification or cloningand immobilization of polypeptide onto an array of the invention.Alternatively, the methods of the invention can be used to modify thestructure of an antibody produced by a cell to be modified, e.g., anantibody's affinity can be increased or decreased. Furthermore, theability to make or modify antibodies can be a phenotype engineered intoa cell by the methods of the invention.

Methods of immunization, producing and isolating antibodies (polyclonaland monoclonal) are known to those of skill in the art and described inthe scientific and patent literature, see, e.g., Coligan, CURRENTPROTOCOLS IN IMMUNOLOGY, Wiley/Greene, NY (1991); Stites (eds.) BASICAND CLINICAL IMMUNOLOGY (7th ed.) Lange Medical Publications, Los Altos,Calif. (“Stites”); Goding, MONOCLONAL ANTIBODIES: PRINCIPLES ANDPRACTICE (2d ed.) Academic Press, New York, N.Y. (1986); Kohler (1975)Nature 256:495; Harlow (1988) ANTIBODIES, A LABORATORY MANUAL, ColdSpring Harbor Publications, New York. Antibodies also can be generatedin vitro, e.g., using recombinant antibody binding site expressing phagedisplay libraries, in addition to the traditional in vivo methods usinganimals. See, e.g., Hoogenboom (1997) Trends Biotechnol. 15:62-70; Katz(1997) Annu. Rev. Biophys. Biomol. Struct. 26:27-45.

The polypeptides of the invention or fragments comprising at least 5,10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acidsthereof, may also be used to generate antibodies which bind specificallyto the polypeptides or fragments. The resulting antibodies may be usedin immunoaffinity chromatography procedures to isolate or purify thepolypeptide or to determine whether the polypeptide is present in abiological sample. In such procedures, a protein preparation, such as anextract, or a biological sample is contacted with an antibody capable ofspecifically binding to one of the polypeptides of the invention, orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof.

In immunoaffinity procedures, the antibody is attached to a solidsupport, such as a bead or other column matrix. The protein preparationis placed in contact with the antibody under conditions in which theantibody specifically binds to one of the polypeptides of the invention,or fragment thereof. After a wash to remove non-specifically boundproteins, the specifically bound polypeptides are eluted.

The ability of proteins in a biological sample to bind to the antibodymay be determined using any of a variety of procedures familiar to thoseskilled in the art. For example, binding may be determined by labelingthe antibody with a detectable label such as a fluorescent agent, anenzymatic label, or a radioisotope. Alternatively, binding of theantibody to the sample may be detected using a secondary antibody havingsuch a detectable label thereon. Particular assays include ELISA assays,sandwich assays, radioimmunoassays and Western Blots.

Polyclonal antibodies generated against the polypeptides of theinvention, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35,40, 50, 75, 100, or 150 consecutive amino acids thereof can be obtainedby direct injection of the polypeptides into an animal or byadministering the polypeptides to an animal, for example, a nonhuman.The antibody so obtained can bind the polypeptide itself. In thismanner, even a sequence encoding only a fragment of the polypeptide canbe used to generate antibodies which may bind to the whole nativepolypeptide. Such antibodies can then be used to isolate the polypeptidefrom cells expressing that polypeptide.

For preparation of monoclonal antibodies, any technique which providesantibodies produced by continuous cell line cultures can be used.Examples include the hybridoma technique (Kohler and Milstein, Nature,256:495-497, 1975), the trioma technique, the human B-cell hybridomatechnique (Kozbor et al., Immunology Today 4:72, 1983) and theEBV-hybridoma technique (Cole, et al., 1985, in Monoclonal Antibodiesand Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).

Techniques described for the production of single chain antibodies (U.S.Pat. No. 4,946,778) can be adapted to produce single chain antibodies tothe polypeptides of the invention, or fragments comprising at least 5,10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acidsthereof. Alternatively, transgenic mice may be used to express humanizedantibodies to these polypeptides or fragments thereof.

Antibodies generated against the polypeptides of the invention, orfragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, or 150 consecutive amino acids thereof may be used in screening forsimilar polypeptides from other organisms and samples. In suchtechniques, polypeptides from the organism are contacted with theantibody and those polypeptides which specifically bind the antibody aredetected. Any of the procedures described above may be used to detectantibody binding. One such screening assay is described in “Methods forMeasuring Cellulase Activities”, Methods in Enzymology, Vol 160, pp.87-116.

Kits

The invention provides kits comprising the compositions, e.g., nucleicacids, expression cassettes, vectors, cells, transgenic seeds or plantsor plant parts, polypeptides (e.g., a cellulase enzyme) and/orantibodies of the invention. The kits also can contain instructionalmaterial teaching the methodologies and industrial, medical and dietaryuses of the invention, as described herein.

Whole Cell Engineering and Measuring Metabolic Parameters

The methods of the invention provide whole cell evolution, or whole cellengineering, of a cell to develop a new cell strain having a newphenotype, e.g., a new or modified cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity, by modifyingthe genetic composition of the cell. See U.S. patent application no.20040033975.

The genetic composition can be modified by addition to the cell of anucleic acid of the invention, e.g., a coding sequence for an enzyme ofthe invention. See, e.g., WO0229032; WO0196551.

To detect the new phenotype, at least one metabolic parameter of amodified cell is monitored in the cell in a “real time” or “on-line”time frame. In one aspect, a plurality of cells, such as a cell culture,is monitored in “real time” or “on-line.” In one aspect, a plurality ofmetabolic parameters is monitored in “real time” or “on-line.” Metabolicparameters can be monitored using the cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes of the invention.

Metabolic flux analysis (MFA) is based on a known biochemistryframework. A linearly independent metabolic matrix is constructed basedon the law of mass conservation and on the pseudo-steady statehypothesis (PSSH) on the intracellular metabolites. In practicing themethods of the invention, metabolic networks are established, includingthe:

-   -   identity of all pathway substrates, products and intermediary        metabolites    -   identity of all the chemical reactions interconverting the        pathway metabolites, the stoichiometry of the pathway reactions,    -   identity of all the enzymes catalyzing the reactions, the enzyme        reaction kinetics,    -   the regulatory interactions between pathway components, e.g.,        allosteric interactions, enzyme-enzyme interactions etc,    -   intracellular compartmentalization of enzymes or any other        supramolecular organization of the enzymes, and,    -   the presence of any concentration gradients of metabolites,        enzymes or effector molecules or diffusion barriers to their        movement.

Once the metabolic network for a given strain is built, mathematicpresentation by matrix notion can be introduced to estimate theintracellular metabolic fluxes if the on-line metabolome data isavailable. Metabolic phenotype relies on the changes of the wholemetabolic network within a cell. Metabolic phenotype relies on thechange of pathway utilization with respect to environmental conditions,genetic regulation, developmental state and the genotype, etc. In oneaspect of the methods of the invention, after the on-line MFAcalculation, the dynamic behavior of the cells, their phenotype andother properties are analyzed by investigating the pathway utilization.For example, if the glucose supply is increased and the oxygen decreasedduring the yeast fermentation, the utilization of respiratory pathwayswill be reduced and/or stopped, and the utilization of the fermentativepathways will dominate. Control of physiological state of cell cultureswill become possible after the pathway analysis. The methods of theinvention can help determine how to manipulate the fermentation bydetermining how to change the substrate supply, temperature, use ofinducers, etc. to control the physiological state of cells to move alongdesirable direction. In practicing the methods of the invention, the MFAresults can also be compared with transcriptome and proteome data todesign experiments and protocols for metabolic engineering or geneshuffling, etc.

In practicing the methods of the invention, any modified or newphenotype can be conferred and detected, including new or improvedcharacteristics in the cell. Any aspect of metabolism or growth can bemonitored.

Monitoring Expression of an mRNA Transcript

In one aspect of the invention, the engineered phenotype comprisesincreasing or decreasing the expression of an mRNA transcript (e.g., acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme message) or generating new (e.g., cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme)transcripts in a cell. This increased or decreased expression can betraced by testing for the presence of a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme of the invention or bycellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme activity assays. mRNA transcripts, or messages, alsocan be detected and quantified by any method known in the art,including, e.g., Northern blots, quantitative amplification reactions,hybridization to arrays, and the like. Quantitative amplificationreactions include, e.g., quantitative PCR, including, e.g., quantitativereverse transcription polymerase chain reaction, or RT-PCR; quantitativereal time RT-PCR, or “real-time kinetic RT-PCR” (see, e.g., Kreuzer(2001) Br. J. Haematol. 114:313-318; Xia (2001) Transplantation72:907-914).

In one aspect of the invention, the engineered phenotype is generated byknocking out expression of a homologous gene. The gene's coding sequenceor one or more transcriptional control elements can be knocked out,e.g., promoters or enhancers. Thus, the expression of a transcript canbe completely ablated or only decreased.

In one aspect of the invention, the engineered phenotype comprisesincreasing the expression of a homologous gene. This can be effected byknocking out of a negative control element, including a transcriptionalregulatory element acting in cis- or trans-, or, mutagenizing a positivecontrol element. One or more, or, all the transcripts of a cell can bemeasured by hybridization of a sample comprising transcripts of thecell; or, nucleic acids representative of or complementary totranscripts of a cell, by hybridization to immobilized nucleic acids onan array.

Monitoring Expression of Polypeptides, Peptides and Amino Acids

In one aspect of the invention, the engineered phenotype comprisesincreasing or decreasing the expression of a polypeptide (e.g., acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme) or generating new polypeptides in a cell. Thisincreased or decreased expression can be traced by determining theamount of cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme present or by cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme activity assays.Polypeptides, peptides and amino acids also can be detected andquantified by any method known in the art, including, e.g., nuclearmagnetic resonance (NMR), spectrophotometry, radiography (proteinradiolabeling), electrophoresis, capillary electrophoresis, highperformance liquid chromatography (HPLC), thin layer chromatography(TLC), hyperdiffusion chromatography, various immunological methods,e.g., immunoprecipitation, immunodiffusion, immuno-electrophoresis,radioimmunoassays (RIAs), enzyme-linked immunosorbent assays (ELISAs),immuno-fluorescent assays, gel electrophoresis (e.g., SDS-PAGE),staining with antibodies, fluorescent activated cell sorter (FACS),pyrolysis mass spectrometry, Fourier-Transform Infrared Spectrometry,Raman spectrometry, GC-MS, and LC-Electrospray andcap-LC-tandem-electrospray mass spectrometries, and the like. Novelbioactivities can also be screened using methods, or variations thereof,described in U.S. Pat. No. 6,057,103. Furthermore, as discussed below indetail, one or more, or, all the polypeptides of a cell can be measuredusing a protein array.

Industrial, Energy, Pharmaceutical and Other Applications

Polypeptides of the invention (e.g., having cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase) can catalyze thebreakdown of cellulose. The enzymes of the invention can be highlyselective catalysts. The invention provides industrial processes usingenzymes of the invention, e.g., in the pharmaceutical or nutrient (diet)supplement industry, the energy industry (e.g., to make “clean”biofuels), in the food and feed industries, e.g., in methods for makingfood and feed products and food and feed additives. In one aspect, theinvention provides processes using enzymes of the invention in themedical industry, e.g., to make pharmaceuticals or dietary aids orsupplements, or food supplements and additives. In addition, theinvention provides methods for using the enzymes of the invention inbioethanol, including “clean” fuel, production.

The enzymes of the invention can catalyze reactions with exquisitestereo-, regio- and chemo-selectivities. The cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes of theinvention can be engineered to function in various solvents, operate atextreme pHs (for example, high pHs and low pHs) extreme temperatures(for example, high temperatures and low temperatures), extreme salinitylevels (for example, high salinity and low salinity) and catalyzereactions with compounds that are structurally unrelated to theirnatural, physiological substrates.

Biomass Conversion and Production of Clean Bio Fuels

The invention provides enzymes (including mixtures, or “cocktails” ofenzymes) and methods for the conversion of a biomass or anylignocellulosic material (e.g., any composition comprising cellulose,hemicellulose and lignin), to fuels (e.g., bioethanol), in addition tofeeds, foods and chemicals. Thus, the compositions and methods of theinvention provide effective and sustainable alternatives or adjuncts touse of petroleum-based products, e.g., as a mixture of bioethanol andgasoline. The invention provides organisms expressing enzymes of theinvention for participation in chemical cycles involving natural biomassconversion. In one aspect, enzymes and methods for the conversion areused in enzyme ensembles (or “cocktails”) for the efficientdepolymerization of cellulosic and hemicellulosic polymers tometabolizeable carbon moieties. Exemplary enzyme cocktails are describedherein; however, the invention encompasses compositions comprisingmixtures of enzymes comprising at least one (any combination of)enzyme(s) of the invention. As discussed above, the invention providesmethods for discovering and implementing the most effective of enzymesto enable these important new “biomass conversion” and alternativeenergy industrial processes.

In one aspect, the polypeptides having cellulolytic activity, e.g.,cellulases activity, such as endoglucanase, cellobiohydrolase and/orβ-glucosidase (beta-glucosidase) activity, are used in processes forconverting lignocellulosic biomass to ethanol. The invention alsoprovides processes for making ethanol (“bioethanol”) from compositionscomprising lignocellulosic biomass. The lignocellulose biomass materialcan be obtained from agricultural crops, as a byproduct of food or feedproduction, or as lignocellulosic waste products, such as plant residuesand waste paper. Examples of suitable plant residues for treatment withpolypeptides of the invention include grains, seeds, stems, leaves,hulls, husks, corn cobs, corn stover, straw, grasses (e.g., Indiangrass, such as Sorghastrum nutans; or, switch grass, e.g., Panicumspecies, such as Panicum virgatum), and the like, as well as wood, woodchips, wood pulp, and sawdust. Examples of paper waste suitable fortreatment with polypeptides of the invention include discard photocopypaper, computer printer paper, notebook paper, notepad paper, typewriterpaper, and the like, as well as newspapers, magazines, cardboard, andpaper-based packaging materials.

In one aspect, the enzymes and methods of the invention can be used inconjunction with more “traditional” means of making ethanol frombiomass, e.g., as methods comprising hydrolyzing lignocellulosicmaterials by subjecting dried lignocellulosic material in a reactor to acatalyst comprised of a dilute solution of a strong acid and a metalsalt; this can lower the activation energy, or the temperature, ofcellulose hydrolysis to obtain higher sugar yields; see, e.g., U.S. Pat.Nos. 6,660,506; 6,423,145.

Another exemplary method that incorporated use of enzymes of theinvention comprises hydrolyzing lignocellulosic material containinghemicellulose, cellulose and lignin by subjecting the material to afirst stage hydrolysis step in an aqueous medium at a temperature and apressure chosen to effect primarily depolymerization of hemicellulosewithout major depolymerization of cellulose to glucose. This stepresults in a slurry in which the liquid aqueous phase contains dissolvedmonosaccharides resulting from depolymerization of hemicellulose and asolid phase containing cellulose and lignin. A second stage hydrolysisstep can comprise conditions such that at least a major portion of thecellulose is depolymerized, such step resulting in a liquid aqueousphase containing dissolved/soluble depolymerization products ofcellulose. See, e.g., U.S. Pat. No. 5,536,325. Enzymes of the inventioncan be added at any stage of this exemplary process.

Another exemplary method that incorporated use of enzymes of theinvention comprises processing a lignocellulose-containing biomassmaterial by one or more stages of dilute acid hydrolysis with about 0.4%to 2% strong acid; and treating an unreacted solid lignocellulosiccomponent of the acid hydrolyzed biomass material by alkalinedelignification to produce precursors for biodegradable thermoplasticsand derivatives. See, e.g., U.S. Pat. No. 6,409,841. Enzymes of theinvention can be added at any stage of this exemplary process.

Another exemplary method that incorporated use of enzymes of theinvention comprises prehydrolyzing lignocellulosic material in aprehydrolysis reactor; adding an acidic liquid to the solidlignocellulosic material to make a mixture; heating the mixture toreaction temperature; maintaining reaction temperature for timesufficient to fractionate the lignocellulosic material into asolubilized portion containing at least about 20% of the lignin from thelignocellulosic material and a solid fraction containing cellulose;removing a solubilized portion from the solid fraction while at or nearreaction temperature wherein the cellulose in the solid fraction isrendered more amenable to enzymatic digestion; and recovering asolubilized portion. See, e.g., U.S. Pat. No. 5,705,369. Enzymes of theinvention can be added at any stage of this exemplary process.

The invention provides methods for making motor fuel compositions (e.g.,for spark ignition motors) based on liquid hydrocarbons blended with afuel grade alcohol made by using an enzyme or a method of the invention.In one aspect, the fuels made by use of an enzyme of the inventioncomprise, e.g., coal gas liquid- or natural gas liquid-ethanol blends.In one aspect, a co-solvent is biomass-derived 2-methyltetrahydrofuran(MTHF). See, e.g., U.S. Pat. No. 6,712,866.

Methods of the invention for the enzymatic degradation oflignocellulose, e.g., for production of ethanol from lignocellulosicmaterial, can also comprise use of ultrasonic treatment of the biomassmaterial; see, e.g., U.S. Pat. No. 6,333,181.

Another exemplary process for making a biofuel comprising ethanol usingenzymes of the invention comprises pretreating a starting materialcomprising a lignocellulosic feedstock comprising at least hemicelluloseand cellulose. In one aspect, the starting material comprises potatoes,soybean (rapeseed), barley, rye, corn, oats, wheat, beets of sugar caneor a component or waste or food or feed production byproduct. Thestarting material (“feedstock”) is reacted at conditions which disruptthe plant's fiber structure to effect at least a partial hydrolysis ofthe hemicellulose and cellulose. Disruptive conditions can comprise,e.g., subjecting the starting material to an average temperature of 180°C. to 270° C. at pH 0.5 to 2.5 for a period of about 5 seconds to 60minutes; or, temperature of 220° C. to 270° C., at pH 0.5 to 2.5 for aperiod of 5 seconds to 120 seconds, or equivalent. This generates afeedstock with increased accessibility to being digested by an enzyme,e.g., a cellulase enzyme of the invention. U.S. Pat. No. 6,090,595.

Exemplary conditions for cellulase hydrolysis of lignocellulosicmaterial include reactions at temperatures between about 30° C. and 48°C., and/or a pH between about 4.0 and 6.0. Other exemplary conditionsinclude a temperature between about 30° C. and 60° C. and a pH betweenabout 4.0 and 8.0.

Animal Feeds and Food or Feed Additives

In addition to providing dietary aids or supplements, or foodsupplements and additives for human use, the invention also providescompositions and methods for treating animal feeds and foods and food orfeed additives using a polypeptide of the invention, e.g., a proteinhaving a cellulolytic activity, such as a cellulase activity, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes of theinvention, and/or the antibodies of the invention. The inventionprovides animal feeds, foods, and additives comprising cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes of theinvention and/or antibodies of the invention. The animal can be any farmanimal or any animal.

The animal feed additive of the invention may be a granulated enzymeproduct that may readily be mixed with feed components. Alternatively,feed additives of the invention can form a component of a pre-mix. Thegranulated enzyme product of the invention may be coated or uncoated.The particle size of the enzyme granulates can be compatible with thatof feed and pre-mix components. This provides a safe and convenient meanof incorporating enzymes into feeds. Alternatively, the animal feedadditive of the invention may be a stabilized liquid composition. Thismay be an aqueous or oil-based slurry. See, e.g., U.S. Pat. No.6,245,546.

Cellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzymes of the present invention, in the modification ofanimal feed or a food, can process the food or feed either in vitro (bymodifying components of the feed or food) or in vivo. Polypeptides ofthe invention can be added to animal feed or food compositions.

In one aspect, an enzyme of the invention is added in combination withanother enzyme, e.g., beta-galactosidases, catalases, laccases, othercellulases, endoglycosidases, endo-beta-1,4-laccases, amyloglucosidases,other glucosidases, glucose isomerases, glycosyltransferases, lipases,phospholipases, lipooxygenases, beta-laccases,endo-beta-1,3(4)-laccases, cutinases, peroxidases, amylases,glucoamylases, pectinases, reductases, oxidases, decarboxylases,phenoloxidases, ligninases, pullulanases, arabinanases, hemicellulases,mannanases, xylolaccases, xylanases, pectin acetyl esterases,rhamnogalacturonan acetyl esterases, proteases, peptidases, proteinases,polygalacturonases, rhamnogalacturonases, galactanases, pectin lyases,transglutaminases, pectin methylesterases, other cellobiohydrolasesand/or transglutaminases. These enzyme digestion products are moredigestible by the animal. Thus, cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzymes of the invention cancontribute to the available energy of the feed or food, or to thedigestibility of the food or feed by breaking down cellulose.

In another aspect, cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme of the invention can be supplied by expressingthe enzymes directly in transgenic feed crops (as, e.g., transgenicplants, seeds and the like), such as grains, cereals, corn, soy bean,rape seed, lupin and the like. As discussed above, the inventionprovides transgenic plants, plant parts and plant cells comprising anucleic acid sequence encoding a polypeptide of the invention. In oneaspect, the nucleic acid is expressed such that the cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme of theinvention is produced in recoverable quantities. The cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzyme can berecovered from any plant or plant part. Alternatively, the plant orplant part containing the recombinant polypeptide can be used as suchfor improving the quality of a food or feed, e.g., improving nutritionalvalue, palatability, etc.

In one aspect, the enzyme delivery matrix of the invention is in theform of discrete plural particles, pellets or granules. By “granules” ismeant particles that are compressed or compacted, such as by apelletizing, extrusion, or similar compacting to remove water from thematrix. Such compression or compacting of the particles also promotesintraparticle cohesion of the particles. For example, the granules canbe prepared by pelletizing the grain-based substrate in a pellet mill.The pellets prepared thereby are ground or crumbled to a granule sizesuitable for use as an adjuvant in animal feed. Since the matrix isitself approved for use in animal feed, it can be used as a diluent fordelivery of enzymes in animal feed.

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzyme contained in the invention enzyme deliverymatrix and methods is a thermostable cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme, as described herein, soas to resist inactivation of the cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme during manufacture whereelevated temperatures and/or steam may be employed to prepare thepalletized enzyme delivery matrix. During digestion of feed containingthe invention enzyme delivery matrix, aqueous digestive fluids willcause release of the active enzyme. Other types of thermostable enzymesand nutritional supplements that are thermostable can also beincorporated in the delivery matrix for release under any type ofaqueous conditions.

In one aspect, a coating is applied to the enzyme matrix particles formany different purposes, such as to add a flavor or nutrition supplementto animal feed, to delay release of animal feed supplements and enzymesin gastric conditions, and the like. In one aspect, the coating isapplied to achieve a functional goal, for example, whenever it isdesirable to slow release of the enzyme from the matrix particles or tocontrol the conditions under which the enzyme will be released. Thecomposition of the coating material can be such that it is selectivelybroken down by an agent to which it is susceptible (such as heat, acidor base, enzymes or other chemicals). Alternatively, two or morecoatings susceptible to different such breakdown agents may beconsecutively applied to the matrix particles.

The invention is also directed towards a process for preparing anenzyme-releasing matrix. In accordance with the invention, the processcomprises providing discrete plural particles of a grain-based substratein a particle size suitable for use as an enzyme-releasing matrix,wherein the particles comprise a cellulase, e.g., endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase enzyme encoded by an amino acidsequence of the invention. In one aspect, the process includescompacting or compressing the particles of enzyme-releasing matrix intogranules, which most in one aspect is accomplished by pelletizing. Themold inhibitor and cohesiveness agent, when used, can be added at anysuitable time, and in one aspect are mixed with the grain-basedsubstrate in the desired proportions prior to pelletizing of thegrain-based substrate. Moisture content in the pellet mill feed in oneaspect is in the ranges set forth above with respect to the moisturecontent in the finished product, and in one aspect is about 14-15%. Inone aspect, moisture is added to the feedstock in the form of an aqueouspreparation of the enzyme to bring the feedstock to this moisturecontent. The temperature in the pellet mill in one aspect is brought toabout 82° C. with steam. The pellet mill may be operated under anyconditions that impart sufficient work to the feedstock to providepellets. The pelleting process itself is a cost-effective process forremoving water from the enzyme-containing composition.

The compositions and methods of the invention can be practiced inconjunction with administration of prebiotics, which are high molecularweight sugars, e.g., fructo-oligosaccharides (FOS);galacto-oligosaccharides (GOS), GRAS (Generally Recognized As Safe)material. These prebiotics can be metabolized by some probiotic lacticacid bacteria (LAB). They are non-digestible by the majority ofintestinal microbes.

Treating Foods and Food Processing

The invention provides foods and feeds comprising enzymes of theinvention, and methods for using enzymes of the invention in processingfoods and feeds. Cellulases, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes of the invention have numerous applicationsin food processing industry. The invention provides methods forhydrolyzing cellulose-comprising compositions, including, e.g., a plantcell, a bacterial cell, a yeast cell, an insect cell, or an animal cell,or any plant or plant part, or any food or feed, a waste product and thelike.

For example, the invention provides feeds or foods comprising acellulase, e.g., endoglucanase, cellobiohydrolase, beta-glucosidase,xylanase, mannanse, β-xylosidase, arabinofuranosidase, and/oroligomerase enzyme the invention, e.g., in a feed, a liquid, e.g., abeverage (such as a fruit juice or a beer), a bread or a dough or abread product, or a drink (e.g., a beer) or a beverage precursor (e.g.,a wort).

The food treatment processes of the invention can also include the useof any combination of other enzymes such as tryptophanases or tyrosinedecarboxylases, laccases, catalases, laccases, other cellulases,endoglycosidases, endo-beta-1,4-laccases, amyloglucosidases, otherglucosidases, glucose isomerases, glycosyltransferases, lipases,phospholipases, lipooxygenases, beta-laccases,endo-beta-1,3(4)-laccases, cutinases, peroxidases, amylases,glucoamylases, pectinases, reductases, oxidases, decarboxylases,phenoloxidases, ligninases, pullulanases, arabinanases, hemicellulases,mannanases, xylolaccases, xylanases, pectin acetyl esterases,rhamnogalacturonan acetyl esterases, proteases, peptidases, proteinases,polygalacturonases, rhamnogalacturonases, galactanases, pectin lyases,transglutaminases, pectin methylesterases, other cellobiohydrolasesand/or transglutaminases.

In one aspect, the invention provides enzymes and processes forhydrolyzing liquid (liquefied) and granular starch. Such starch can bederived from any source, e.g., beet, cane sugar, potato, corn, wheat,milo, sorghum, rye or bulgher. The invention applies to any plant starchsource, e.g., a grain starch source, which is useful in liquefaction(for example, to make bioethanol), including any other grain orvegetable source known to produce starch suitable for liquefaction. Themethods of the invention comprise liquefying starch (e.g., makingbioethanol) from any natural material, such as rice, germinated rice,corn, barley, milo, wheat, legumes, potato, beet, cane sugar and sweetpotato. The liquefying process can substantially hydrolyze the starch toproduce a syrup. The temperature range of the liquefaction can be anyliquefaction temperature which is known to be effective in liquefyingstarch. For example, the temperature of the starch can be between about80° C. to about 115° C., between about 100° C. to about 110° C., andfrom about 105° C. to about 108° C. The bioethanols made using theenzymes and processes of the invention can be used as fuels or in fuels(e.g., auto fuels); e.g., as discussed below, in addition to their usein (or for making) foods and feeds, including alcoholic beverages.

Waste Treatment

The invention provides enzymes for use in waste treatment. Cellulases,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymesof the invention can be used in a variety of waste treatment or relatedindustrial applications, e.g., in waste treatment related to biomassconversion to generate fuels. For example, in one aspect, the inventionprovides a solid and/or liquid waste digestion process using cellulase,e.g., endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomerase enzymesof the invention. The methods can comprise reducing the mass and volumeof substantially untreated solid waste. Solid waste can be treated withan enzymatic digestive process in the presence of an enzymatic solution(including cellulase, e.g., endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase enzymes of the invention) at a controlledtemperature. This results in a reaction without appreciable bacterialfermentation from added microorganisms. The solid waste is convertedinto a liquefied waste and any residual solid waste. The resultingliquefied waste can be separated from said any residual solidifiedwaste. See e.g., U.S. Pat. No. 5,709,796.

In one aspect, the compositions and methods of the invention are usedfor odor removal, odor prevention or odor reduction, e.g., in animalwaste lagoons, e.g., on swine farms, in other animal waste managementsystems, or in any industrial or food processing application.

The enzymes and methods for the conversion of biomass (e.g.,lignocellulosic materials) to fuels (e.g., bioethanol) can incorporatethe treatment/recycling of municipal solid waste material, includingwaste obtained directly from a municipality or municipal solid wastethat was previously land-filled and subsequently recovered, or sewagesludge, e.g., in the form of sewage sludge cake which containssubstantial amounts of cellulosic material. Since sewage sludge cakeswill normally not contain substantial amounts of recyclable materials(aluminum, glass, plastics, etc.), they can be directly treated withconcentrated sulfuric acid (to reduce the heavy metal content of thecellulosic component of the waste) and processed in the ethanolproduction system. See, e.g., U.S. Pat. Nos. 6,267,309; 5,975,439.

Another exemplary method using enzymes of the invention for recoveringorganic and inorganic matter from waste material comprises sterilizing asolid organic matter and softening it by subjecting it to heat andpressure. This exemplary process may be carried out by first agitatingwaste material and then subjecting it to heat and pressure, whichsterilizes it and softens the organic matter contained therein. In oneaspect, after heating under pressure, the pressure may be suddenlyreleased from a perforated chamber to forces the softened organic matteroutwardly through perforations of the container, thus separating theorganic matter from the solid inorganic matter. The softened sterilized,organic matter is then fermented in fermentation chamber, e.g., usingenzymes of the invention, e.g., to form a mash. The mash may besubjected to further processing by centrifuge, distillation columnand/or anaerobic digester to recover fuels such as ethanol and methane,and animal feed supplements. See, e.g., U.S. Pat. No. 6,251,643.

Enzymes of the invention can also be used in processes, e.g.,pretreatments, to reduce the odor of an industrial waste, or a wastegenerated from an animal production facility, and the like. For example,enzymes of the invention can be used to treat an animal waste in a wasteholding facility to enhance efficient degradation of large amounts oforganic matter with reduced odor. The process can also includeinoculation with sulfide-utilizing bacteria and organic digestingbacteria and lytic enzymes (in addition to an enzyme of the invention).See, e.g., U.S. Pat. No. 5,958,758.

Enzymes of the invention can also be used in mobile systems, e.g., batchtype reactors, for bioremediation of aqueous, hazardous wastes, e.g., asdescribed in U.S. Pat. No. 5,833,857. Batch type reactors can be largevessels having circulatory capability wherein bacteria (e.g., expressingan enzyme of the invention) are maintained in an efficient state bynutrients being feed into the reactor. Such systems can be used whereeffluent can be delivered to the reactor or the reactor is built into awaste water treatment system. Enzymes of the invention can also be usedin treatment systems for use at small or temporary remote locations,e.g., portable, high volume, highly efficient, versatile waste watertreatment systems.

The waste treatment processes of the invention can include the use ofany combination of other enzymes such as other cellulase, e.g.,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase enzymes,catalases, laccases, other cellulases, endoglycosidases,endo-beta-1,4-laccases, amyloglucosidases, other glucosidases, glucoseisomerases, glycosyltransferases, lipases, phospholipases,lipooxygenases, beta-laccases, endo-beta-1,3(4)-laccases, cutinases,peroxidases, amylases, glucoamylases, pectinases, reductases, oxidases,decarboxylases, phenoloxidases, ligninases, pullulanases, phytases,arabinanases, hemicellulases, mannanases, xylolaccases, xylanases,pectin acetyl esterases, rhamnogalacturonan acetyl esterases, proteases,peptidases, proteinases, polygalacturonases, rhamnogalacturonases,galactanases, pectin lyases, transglutaminases, pectin methylesterases,other cellobiohydrolases and/or transglutaminases.

Detergent Compositions

The invention provides detergent compositions comprising one or morepolypeptides of the invention (e.g., enzymes having cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity) andmethods of making and using these compositions. The inventionincorporates all methods of making and using detergent compositions,see, e.g., U.S. Pat. Nos. 6,413,928; 6,399,561; 6,365,561; 6,380,147.The detergent compositions can be a one and two part aqueouscomposition, a non-aqueous liquid composition, a cast solid, a granularform, a particulate form, a compressed tablet, a gel and/or a paste anda slurry form. The invention also provides methods capable of a rapidremoval of gross food soils, films of food residue and other minor foodcompositions using these detergent compositions. Enzymes of theinvention can facilitate the removal of starchy stains by means ofcatalytic hydrolysis of the starch polysaccharide. Enzymes of theinvention can be used in dishwashing detergents in textile launderingdetergents.

The actual active enzyme content depends upon the method of manufactureof a detergent composition and is not critical, assuming the detergentsolution has the desired enzymatic activity. In one aspect, the amountof glucosidase present in the final solution ranges from about 0.001 mgto 0.5 mg per gram of the detergent composition. The particular enzymechosen for use in the process and products of this invention dependsupon the conditions of final utility, including the physical productform, use pH, use temperature, and soil types to be degraded or altered.The enzyme can be chosen to provide optimum activity and stability forany given set of utility conditions. In one aspect, the polypeptides ofthe present invention are active in the pH ranges of from about 4 toabout 12 and in the temperature range of from about 20° C. to about 95°C. The detergents of the invention can comprise cationic, semi-polarnonionic or zwitterionic surfactants; or, mixtures thereof.

Enzymes of the present invention (e.g., enzymes having cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity) can beformulated into powdered and liquid detergents having pH between 4.0 and12.0 at levels of about 0.01 to about 5% (preferably 0.1% to 0.5%) byweight. These detergent compositions can also include other enzymes suchas known proteases, cellulases, lipases or endoglycosidases, as well asbuilders and stabilizers. The addition of enzymes of the invention toconventional cleaning compositions does not create any special uselimitation. In other words, any temperature and pH suitable for thedetergent is also suitable for the present compositions as long as thepH is within the above range, and the temperature is below the describedenzyme's denaturing temperature. In addition, the polypeptides of theinvention can be used in a cleaning composition without detergents,again either alone or in combination with builders and stabilizers.

The present invention provides cleaning compositions including detergentcompositions for cleaning hard surfaces, detergent compositions forcleaning fabrics, dishwashing compositions, oral cleaning compositions,denture cleaning compositions, and contact lens cleaning solutions.

In one aspect, the invention provides a method for washing an objectcomprising contacting the object with a polypeptide of the inventionunder conditions sufficient for washing. A polypeptide of the inventionmay be included as a detergent additive. The detergent composition ofthe invention may, for example, be formulated as a hand or machinelaundry detergent composition comprising a polypeptide of the invention.A laundry additive suitable for pre-treatment of stained fabrics cancomprise a polypeptide of the invention. A fabric softener compositioncan comprise a polypeptide of the invention. Alternatively, apolypeptide of the invention can be formulated as a detergentcomposition for use in general household hard surface cleaningoperations. In alternative aspects, detergent additives and detergentcompositions of the invention may comprise one or more other enzymessuch as a protease, a lipase, a cutinase, another glucosidase, acarbohydrase, another cellulase, a pectinase, a mannanase, an arabinase,a galactanase, a xylanase, an oxidase, e.g., a lactase, and/or aperoxidase. The properties of the enzyme(s) of the invention are chosento be compatible with the selected detergent (i.e., pH-optimum,compatibility with other enzymatic and non-enzymatic ingredients, etc.)and the enzyme(s) is present in effective amounts. In one aspect,enzymes of the invention are used to remove malodorous materials fromfabrics. Various detergent compositions and methods for making them thatcan be used in practicing the invention are described in, e.g., U.S.Pat. Nos. 6,333,301; 6,329,333; 6,326,341; 6,297,038; 6,309,871;6,204,232; 6,197,070; 5,856,164.

The detergents and related processes of the invention can also includethe use of any combination of other enzymes such as tryptophanases ortyrosine decarboxylases, laccases, catalases, laccases, othercellulases, endoglycosidases, endo-beta-1,4-laccases, amyloglucosidases,other glucosidases, glucose isomerases, glycosyltransferases, lipases,phospholipases, lipooxygenases, beta-laccases,endo-beta-1,3(4)-laccases, cutinases, peroxidases, amylases,glucoamylases, pectinases, reductases, oxidases, decarboxylases,phenoloxidases, ligninases, pullulanases, arabinanases, hemicellulases,mannanases, xylolaccases, xylanases, pectin acetyl esterases,rhamnogalacturonan acetyl esterases, proteases, peptidases, proteinases,polygalacturonases, rhamnogalacturonases, galactanases, pectin lyases,transglutaminases, pectin methylesterases, other cellobiohydrolasesand/or transglutaminases.

Treating Fabrics and Textiles

The invention provides methods of treating fabrics and textiles usingone or more polypeptides of the invention, e.g., enzymes havingcellulase, endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase, arabinofuranosidase, and/or oligomeraseactivity. The polypeptides of the invention can be used in anyfabric-treating method, which are well known in the art, see, e.g., U.S.Pat. No. 6,077,316. For example, in one aspect, the feel and appearanceof a fabric is improved by a method comprising contacting the fabricwith an enzyme of the invention in a solution. In one aspect, the fabricis treated with the solution under pressure.

In one aspect, the enzymes of the invention are applied during or afterthe weaving of textiles, or during the desizing stage, or one or moreadditional fabric processing steps. During the weaving of textiles, thethreads are exposed to considerable mechanical strain. Prior to weavingon mechanical looms, warp yarns are often coated with sizing starch orstarch derivatives in order to increase their tensile strength and toprevent breaking. The enzymes of the invention can be applied to removethese sizing starch or starch derivatives. After the textiles have beenwoven, a fabric can proceed to a desizing stage. This can be followed byone or more additional fabric processing steps. Desizing is the act ofremoving size from textiles. After weaving, the size coating must beremoved before further processing the fabric in order to ensure ahomogeneous and wash-proof result. The invention provides a method ofdesizing comprising enzymatic hydrolysis of the size by the action of anenzyme of the invention.

The enzymes of the invention (e.g., enzymes having cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity) can beused to desize fabrics, including cotton-containing fabrics, asdetergent additives, e.g., in aqueous compositions. The inventionprovides methods for producing a stonewashed look on indigo-dyed denimfabric and garments. For the manufacture of clothes, the fabric can becut and sewn into clothes or garments, which is afterwards finished. Inparticular, for the manufacture of denim jeans, different enzymaticfinishing methods have been developed. The finishing of denim garmentnormally is initiated with an enzymatic desizing step, during whichgarments are subjected to the action of amylolytic enzymes in order toprovide softness to the fabric and make the cotton more accessible tothe subsequent enzymatic finishing steps. The invention provides methodsof finishing denim garments (e.g., a “bio-stoning process”), enzymaticdesizing and providing softness to fabrics using the Enzymes of theinvention. The invention provides methods for quickly softening denimgarments in a desizing and/or finishing process.

The invention also provides disinfectants comprising enzymes of theinvention (e.g., enzymes having cellulase, endoglucanase,cellobiohydrolase, beta-glucosidase, xylanase, mannanse, β-xylosidase,arabinofuranosidase, and/or oligomerase activity).

The fabric or textile treatment processes of the invention can alsoinclude the use of any combination of other enzymes such astryptophanases or tyrosine decarboxylases, laccases, catalases,laccases, other cellulases, endoglycosidases, endo-beta-1,4-laccases,amyloglucosidases, other glucosidases, glucose isomerases,glycosyltransferases, lipases, phospholipases, lipooxygenases,beta-laccases, endo-beta-1,3(4)-laccases, cutinases, peroxidases,amylases, glucoamylases, pectinases, reductases, oxidases,decarboxylases, phenoloxidases, ligninases, pullulanases, arabinanases,hemicellulases, mannanases, xylolaccases, xylanases, pectin acetylesterases, rhamnogalacturonan acetyl esterases, proteases, peptidases,proteinases, polygalacturonases, rhamnogalacturonases, galactanases,pectin lyases, transglutaminases, pectin methylesterases, othercellobiohydrolases and/or transglutaminases.

Paper or Pulp Treatment

The enzymes of the invention (e.g., enzymes having cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity) can bein paper or pulp treatment or paper deinking. For example, in oneaspect, the invention provides a paper treatment process using enzymesof the invention. In one aspect, the enzymes of the invention can beused to modify starch in the paper thereby converting it into aliquefied form. In another aspect, paper components of recycledphotocopied paper during chemical and enzymatic deinking processes. Inone aspect, Enzymes of the invention can be used in combination withother enzymes, including other cellulases (including otherendoglucanases, cellobiohydrolases and/or beta-glucosidases). The wood,paper, paper product or pulp can be treated by the following threeprocesses: 1) disintegration in the presence of an enzyme of theinvention, 2) disintegration with a deinking chemical and an enzyme ofthe invention, and/or 3) disintegration after soaking with an enzyme ofthe invention. The recycled paper treated with an enzyme of theinvention can have a higher brightness due to removal of toner particlesas compared to the paper treated with just cellulase. While theinvention is not limited by any particular mechanism, the effect of anenzyme of the invention may be due to its behavior as surface-activeagents in pulp suspension.

The invention provides methods of treating paper and paper pulp usingone or more polypeptides of the invention. The polypeptides of theinvention can be used in any paper- or pulp-treating method, which arewell known in the art, see, e.g., U.S. Pat. Nos. 6,241,849; 6,066,233;5,582,681. For example, in one aspect, the invention provides a methodfor deinking and decolorizing a printed paper containing a dye,comprising pulping a printed paper to obtain a pulp slurry, anddislodging an ink from the pulp slurry in the presence of an enzyme ofthe invention (other enzymes can also be added). In another aspect, theinvention provides a method for enhancing the freeness of pulp, e.g.,pulp made from secondary fiber, by adding an enzymatic mixturecomprising an enzyme of the invention (can also include other enzymes,e.g., pectinase enzymes) to the pulp and treating under conditions tocause a reaction to produce an enzymatically treated pulp. The freenessof the enzymatically treated pulp is increased from the initial freenessof the secondary fiber pulp without a loss in brightness.

The paper, wood or pulp treatment or recycling processes of theinvention can also include the use of any combination of other enzymessuch as tryptophanases or tyrosine decarboxylases, laccases, catalases,laccases, other cellulases, endoglycosidases, endo-beta-1,4-laccases,amyloglucosidases, other glucosidases, glucose isomerases,glycosyltransferases, lipases, phospholipases, lipooxygenases,beta-laccases, endo-beta-1,3(4)-laccases, cutinases, peroxidases,amylases, glucoamylases, pectinases, reductases, oxidases,decarboxylases, phenoloxidases, ligninases, pullulanases, arabinanases,hemicellulases, mannanases, xylolaccases, xylanases, pectin acetylesterases, rhamnogalacturonan acetyl esterases, proteases, peptidases,proteinases, polygalacturonases, rhamnogalacturonases, galactanases,pectin lyases, transglutaminases, pectin methylesterases, othercellobiohydrolases and/or transglutaminases.

Repulping: Treatment of Lignocellulosic Materials

The invention also provides a method for the treatment oflignocellulosic fibers, wherein the fibers are treated with apolypeptide of the invention (e.g., enzymes having cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity), in anamount which is efficient for improving the fiber properties. Theenzymes of the invention may also be used in the production or recyclingof lignocellulosic materials such as pulp, paper and cardboard, fromstarch reinforced waste paper and cardboard, especially where repulpingor recycling occurs at pH above 7 and where the enzymes of the inventioncan facilitate the disintegration of the waste material throughdegradation of the reinforcing starch. The enzymes of the invention canbe useful in a process for producing a papermaking pulp fromstarch-coated printed paper. The process may be performed as describedin, e.g., WO 95/14807. An exemplary process comprises disintegrating thepaper to produce a pulp, treating with a starch-degrading enzyme before,during or after the disintegrating, and separating ink particles fromthe pulp after disintegrating and enzyme treatment. See also U.S. Pat.No. 6,309,871 and other US patents cited herein. Thus, the inventionincludes a method for enzymatic deinking of recycled paper pulp, whereinthe polypeptide is applied in an amount which is efficient for effectivede-inking of the fiber surface.

Brewing and Fermenting

The invention provides methods of brewing (e.g., fermenting) beercomprising an enzyme of the invention, e.g., enzymes having cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity. In oneexemplary process, starch-containing raw materials are disintegrated andprocessed to form a malt. An enzyme of the invention is used at anypoint in the fermentation process. For example, enzymes of the inventioncan be used in the processing of barley malt. The major raw material ofbeer brewing is barley malt. This can be a three stage process. First,the barley grain can be steeped to increase water content, e.g., toaround about 40%. Second, the grain can be germinated by incubation at15-25° C. for 3 to 6 days when enzyme synthesis is stimulated under thecontrol of gibberellins. During this time enzyme levels risesignificantly. In one aspect, enzymes of the invention are added at this(or any other) stage of the process. The action of the enzyme results inan increase in fermentable reducing sugars. This can be expressed as thediastatic power, DP, which can rise from around 80 to 190 in 5 days at12° C.

Enzymes of the invention can be used in any beer producing process, asdescribed, e.g., in U.S. Pat. Nos. 5,762,991; 5,536,650; 5,405,624;5,021,246; 4,788,066.

Increasing the Flow of Production Fluids from a Subterranean Formation

The invention also includes a method using an enzyme of the invention(e.g., enzymes having cellulase, endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase activity), wherein the method increases the flow ofproduction fluids from a subterranean formation by removing viscous,starch-containing, damaging fluids formed during production operations;these fluids can be found within the subterranean formation whichsurrounds a completed well bore. Thus, this method of the inventionresults in production fluids being able to flow from the well bore. Thismethod of the invention also addresses the problem of damaging fluidsreducing the flow of production fluids from a formation below expectedflow rates. In one aspect, the invention provides for formulating anenzyme treatment (using an enzyme of the invention) by blending togetheran aqueous fluid and a polypeptide of the invention; pumping the enzymetreatment to a desired location within the well bore; allowing theenzyme treatment to degrade the viscous, starch-containing, damagingfluid, whereby the fluid can be removed from the subterranean formationto the well surface; and wherein the enzyme treatment is effective toattack the alpha glucosidic linkages in the starch-containing fluid.

The subterranean formation enzyme treatment processes of the inventioncan also include the use of any combination of other enzymes such astryptophanases or tyrosine decarboxylases, laccases, catalases,laccases, other cellulases, endoglycosidases, endo-beta-1,4-laccases,amyloglucosidases, other glucosidases, glucose isomerases,glycosyltransferases, lipases, phospholipases, lipooxygenases,beta-laccases, endo-beta-1,3(4)-laccases, cutinases, peroxidases,amylases, glucoamylases, pectinases, reductases, oxidases,decarboxylases, phenoloxidases, ligninases, pullulanases, arabinanases,hemicellulases, mannanases, xylolaccases, xylanases, pectin acetylesterases, rhamnogalacturonan acetyl esterases, proteases, peptidases,proteinases, polygalacturonases, rhamnogalacturonases, galactanases,pectin lyases, transglutaminases, pectin methylesterases, othercellobiohydrolases and/or transglutaminases.

Pharmaceutical Compositions and Dietary Supplements

The invention also provides pharmaceutical compositions and dietarysupplements (e.g., dietary aids) comprising a cellulase of the invention(e.g., enzymes having endoglucanase, cellobiohydrolase,beta-glucosidase, xylanase, mannanse, β-xylosidase, arabinofuranosidase,and/or oligomerase activity). The cellulase activity comprisesendoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity. In oneaspect, the pharmaceutical compositions and dietary supplements (e.g.,dietary aids) are formulated for oral ingestion, e.g., to improve thedigestibility of foods and feeds having a high cellulose orlignocellulosic component.

Periodontal treatment compounds can comprise an enzyme of the invention,e.g., as described in U.S. Pat. No. 6,776,979. Compositions and methodsfor the treatment or prophylaxis of acidic gut syndrome can comprise anenzyme of the invention, e.g., as described in U.S. Pat. No. 6,468,964.

In another aspect, wound dressings, implants and the like compriseantimicrobial (e.g., antibiotic-acting) enzymes, including an enzyme ofthe invention (including, e.g., exemplary sequences of the invention).Enzymes of the invention can also be used in alginate dressings,antimicrobial barrier dressings, burn dressings, compression bandages,diagnostic tools, gel dressings, hydro-selective dressings,hydrocellular (foam) dressings, hydrocolloid dressings, I.V dressings,incise drapes, low adherent dressings, odor absorbing dressings, pastebandages, post operative dressings, scar management, skin care,transparent film dressings and/or wound closure. Enzymes of theinvention can be used in wound cleansing, wound bed preparation, totreat pressure ulcers, leg ulcers, burns, diabetic foot ulcers, scars,IV fixation, surgical wounds and minor wounds. Enzymes of the inventioncan be used to in sterile enzymatic debriding compositions, e.g.,ointments. In various aspects, the cellulase is formulated as a tablet,gel, pill, implant, liquid, spray, powder, food, feed pellet or as anencapsulated formulation.

Biodefense Applications

In other aspects, cellulases of the invention (e.g., enzymes havingendoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity) can beused in biodefense (e.g., destruction of spores or bacteria comprising alignocellulosic material). Use of cellulases of the invention inbiodefense applications offer a significant benefit, in that they can bevery rapidly developed against any currently unknown or biologicalwarfare agents of the future. In addition, cellulases of the inventioncan be used for decontamination of affected environments. In aspect, theinvention provides a biodefense or bio-detoxifying agent comprising apolypeptide having a cellulase activity, wherein the polypeptidecomprises a sequence of the invention (including, e.g., exemplarysequences of the invention), or a polypeptide encoded by a nucleic acidof the invention (including, e.g., exemplary sequences of theinvention), wherein optionally the polypeptide has activity comprisingendoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity.

The following examples are offered to illustrate, but not to limit theclaimed invention.

EXAMPLES Example 1 GIGAMATRIX™ Screen

In one aspect, the methods of the invention use Diversa Corporation'sproprietary GIGAMATRIX™ platform; see PCT Patent Publication No. WO01/38583; U.S. patent application no. 20050046833; 20020080350; U.S.Pat. No. 6,918,738; Design Pat. No. D480,814. For example, in oneaspect, GIGAMATRIX™ is used in methods to determine if a polypeptide hascellulase activity and is within the scope of the invention, or, toidentify and isolate a polypeptide having cellulolytic activity, e.g.,cellulase activity, such as endoglucanase, cellobiohydrolase and/orβ-glucosidase (beta-glucosidase) activity.

A GIGAMATRIX™ platform can include an ultra-high throughput screen basedon a 100,000 well microplate with the dimensions of a conventional 96well plate. In this example, the GIGAMATRIX™ screen can be implementedusing 2 substrates—Methyl-umbelliferyl cellobioside (MUC) andmethylumbelliferyl lactoside (MUL). Phagemid versions of differentclones can be screened because the substrate diffuses into cells andfluorescence was thought to be more easily detectable. A host strainlacking, beta-galactosidase can be used in order to decrease activity onthe lactoside substrate. The lactoside substrate can result in fewerhits and can be deemed more specific than the cellobiose substrate. Inaddition, the lactoside substrate can result in fewer beta-glucosidasehits. A secondary screening can consist of plating the clones on agarplates and then colony picking into 384 well plates containing media andMUL. Active clones against MUL are differentiated from a background ofinactive clones. Individual clones can then be grown overnight andfluorescence measured. The most active hits can then be picked forsequencing.

Characterization Enzyme and Substrate Activity

The hits discovered in the GIGAMATRIX™ screen can first be screenedagainst cellohexaose to determine action pattern on a celluloseoligomer. Clones can be grown overnight in TB media containingantibiotic, cells can then be lysed and lysates clarified bycentrifugation. Subclones can be grown to an OD600=0.5 induced with anappropriate inducer and then grown an additional 3 h before lysing thecells and clarifying the lysate. Genomic clones will generally have lessactivity than a subclone, but are a more facile way of assessingactivity in a large range of clones. Initial studies can be performedusing thin layer chromatography (TLC) for endpoint reactions usually runfor 24 h. Enzymes can also be tested on phosphoric acid swollencellulose (PASC), which is crystalline cellulose that is made moreamorphous through swelling by acid treatment.

Cellulases which are active against PASC, can also release cellobiose aswell as celltriose and/or glucose. The clones from the GIGAMATRIX™discovery effort can be also tested against PASC and on cellulosicsubstrates such as cellohexaose (Seikagaku, Japan). Thin layerchromatography (TLC) experiments can be use to show that clones are ableto hydrolyze the cellohexaose. Of these clones, some are able togenerate glucose as the final product. Several enzymes can producecellobiose and/or larger fragments, but when the exact nature of theproduct pattern can not be discerned from the TLC experiments, acapillary electrophoresis (CE) method can also be used.

Example 2 Capillary Electrophoresis

In some aspects, Capillary Electrophoresis (CE) is used in assays toscreen for enzyme activity, e.g., CE is used in methods to determine ifa polypeptide has cellulolytic activity, e.g., cellulases activity, suchas endoglucanase, cellobiohydrolase and/or β-glucosidase(beta-glucosidase) activity, and is within the scope of the invention,or, to identify and isolate a polypeptide having cellulolytic activity,e.g., cellulase activity. Capillary Electrophoresis (CE) offers theadvantages of faster run times and greater assay sensitivity. The CEmethod can use 1-aminopyrene-3,6,8-trisulfonate (APTS) as thefluorophore and can be optimized for use with sugars and sugar oligomers(Guttman (1996) High-resolution capillary gel electrophoresis ofreducing oligosaccharides labeled with 1-aminopyrene-3,6,8-trisulfonate.Anal. Biochem 233:234-242). Enzymes that are active on cellohexaose canbe subjected to tests on phosphoric acid swollen cellulose as well ascellohexaose. Genes can be subcloned, expressed and partially purifiedusing a nickel-chelating column. Enzymes can be incubated with substratefor 1 h and the products analyzed using a 10 cm or 48 cm capillary.Cellohexaose elutes at 2 and 9 minutes for the 10 and 48 cm capillariesrespectively. The 48 cm capillary gives better separation of products incase there are low amounts of sugar or if there are contaminants in themixture. The CE method can be implemented for studies on enzymes fromthe GIGAMATRIX™ discovery that show good activity on cellohexaose withTLC detection.

Glycosyl hydrolase family 5 contains mainly endoglucanases, but thereare examples of cellobiohydrolases. CelO from Clostridium thermocellumhas been characterized as a cellobiohydrolase based on activity onrelease of only cellobiose from amorphic and crystalline cellulose(Zverlov (2002) A newly described cellulosomal cellobiohydrolase, CelO,from Clostridium thermocellum: investigation of the exo-mode ofhydrolysis, and binding capacity to crystalline cellulose. Microbiology148:247-255).

The endoglucanase from Acidothermus cellulolyticus has an insertion thatis in close proximity to the substrate binding site. This insertioncould form a loop which encloses the substrate binding site thusconverting this enzyme from an endoglucanase to a cellobiohydrolase.When enzymes tested on cellohexaose produce mainly cellobiose with asmaller amount of cellotriose, this can be explained by the fact thatcellobiohydolases have the capability to produce both cellobiose andcellotriose from a cellohexaose substrate (Harjunpaa (1996)Cello-oligosaccharide hydrolysis by cellobiohydrolase II fromTrichoderma reesei. Association and rate constants derived from ananalysis of progress curves. Eur. J Biochem 240:584-591).

Example 3 Sequence Based Discovery

The invention provides methods for identifying and isolating cellulases,e.g., cellobiohydrolases, using sequences of the invention. In oneexemplary method, primers that are homologous to conserved regions ofthree glycosyl hydrolase families that contain cellobiohydrolases can beused to screen either polynucleotide libraries or DNA derived fromfungal samples. For example, primers can be designed towards family 48conserved regions and towards family 6 and family 7. Fungal librariescan be screened with these primers.

Example 4 Genetic Engineering of an Enzyme with CellobiohydrolaseActivity

This example describes the genetic engineering of an exemplary enzyme ofthe invention. This enzyme can be used in the conversion of biomass tofuels and chemicals, and for making effective and sustainablealternatives to petroleum-based products. This enzyme can be expressedin organisms (e.g., microorganisms, such as bacteria) for itsparticipation in chemical cycles involving natural biomass conversion.In one aspect, this enzyme is used in “enzyme ensembles” for theefficient depolymerization of cellulosic and hemicellulosic polymers tometabolizable carbon moieties. As discussed above, the inventionprovides methods for discovering and implementing the most effective ofenzymes to enable these important new “biomass conversion” andalternative energy industrial processes.

Using metagenomic discovery and a non-stochastic method of directedevolution (called “DIRECTEVOLUTION®, as described, e.g., in U.S. Pat.No. 6,939,689, which includes Gene Site Saturation Mutagenesis (GSSM)(as discussed above, see also U.S. Pat. Nos. 6,171,820 and 6,579,258)and Tunable GeneReassembly (TGR) (see, e.g., U.S. Pat. No. 6,537,776)technologies. These technologies can be used for the discovery andoptimization of an enzyme component for cellulose reduction to glucose,cellobiohydrolase.

An enzyme discovery screen can be implemented using DiversaCorporation's GIGAMATRIX™ high throughput expression screening platform(discussed above) to identify cellobiohydrolases usingmethylumbelliferyl cellobioside as substrate. Hits can be characterizedfor activity against AVICEL® Microcrystalline Cellulose (MCC) (FMCCorporation, Philadelphia, Pa.). An enzyme can be chosen as a candidatefor optimization using Gene Site Saturation Mutagenesis (GSSM)technology. However, before performing GSSM evolution, the signalsequence, if present, can be removed and a starting methionine added. Asdiscussed above, GSSM technology can rapidly mutate all amino acids inthe protein to the 19 other amino acids in a sequential fashion. Mutantscan be screened using a fiber-based assay and potential upmutantsrepresenting single amino acid changes can be identified. Theseupmutants can be combined into a new library representing combinationsof the upmutants. This library can be screened resulting inidentification of several candidate enzymes for commercialization.

Research Summary

GIGAMATRIX™ Screen

The GIGAMATRIX™ (GMx) screening platform is an ultra-high throughputmethod based on a 100,000 well microplate with the dimensions of aconventional 96 well plate (see Phase II application for details). Thescreen works with fluorescent substrates. The GMx screen can beimplemented using 2 substrates based on previously shown activity bycellulases. Methylumbelliferyl cellobioside (MUC) can be used as thescreening substrate. In addition, resorufin-beta-glucopyranoside can bealso included in the screen in order to eliminate clones that haveactivity on both substrates and are presumed to be beta-glucosidases.

Amplified phage or phagemid versions of the target libraries can bescreened. Host strains lacking beta-galactosidase genes can be used inorder to decrease endogenous host activity on the substrates. Librariescan be chosen for screening based on the fact that these librariesyielded cellulase hits from a previous screening program.

Secondary screening can consist of plating the clones on agar plates andthen colony picking into 384 well plates containing media andmethylumbelliferyl cellobioside (MUC) termed a “breakout”. FIG. 10illustrates in graphic form data showing a typical GIGAMATRIX™ (GMx)breakout. To generate this data, active clones against MUC (i.e., ableto hydrolyze methylumbelliferyl cellobioside) are differentiated from abackground of inactive clones. Individual clones were then grownovernight and fluorescence was measured and the most active hits werepicked for sequencing. In FIG. 10, the X axis shows sample name; Y axisis relative fluorescent units. Positive “hits” were plated onto agarplates and then colony picked into 384 well plates containingLB+antibiotic plus 50 μM MUC and grown overnight.

Characterization

Genes discovered in the GIGAMATRIX™ screen can be sequenced and the dataanalyzed. Open reading frames (ORFs) can be annotated using a softwaresystem. The ORFs can be subcloned into the appropriate vector(s) withthe introduction of DNA encoding C-terminal His-tags. Construct DNA canbe transformed into the appropriate E. coli host(s) and expressed forcharacterization studies. The gene products can be screened againstphosphoric acid-swollen cellulose (PASC). PASC is crystalline cellulosethat is made more amorphous through swelling by acid treatment. PASC wasprepared from AVICEL® Microcrystalline Cellulose (MCC). Subclones can begrown, expressed and lysed. Lysates can be incubated with PASC and thereaction products analyzed using the bicinchoninic acid (BCA) reducingsugar assay. The most active subclones can be selected for larger scalegrowth and purification. The specific activity of these subclones can bedetermined on PASC.

The subclones can be also analyzed by capillary electrophoresis (CE).Lysates can be incubated with substrate for 30 hours. The reactionproducts can be derivatized with the fluorophore1-aminopyrene-3,6,8-trisulfonate (APTS). The products can be analyzedusing a 48 cm capillary. Cellobiose elutes at 6 minutes. The results canshow that several enzymes have reaction product profiles representativeof processive enzymes. A processive enzyme is defined as having a ratioof cellobiose/(glucose+cellotriose)≧10.

Fungal CBHs in Pichia

Genes of newly discovered cellobiohydrolases can be transformed into P.pastoris and the transformations can be spread onto solid agar plates.The samples can be grown and induced and the supernatants incubated withPASC in the presence of a β-glucosidase. The reaction products can beanalyzed using the glucose-oxidase assay. A glycosyl hydrolase family 6cellobiohydrolase, was successfully heterologously expressed in P.pastoris.

GSSM Screening

GSSM technology (discussed above) was used to rapidly and sequentiallymutate the amino acids of the catalytic and carbohydrate binding domainof the target protein into the 19 other amino acids. In addition,variants of a wild-type enzyme can be tested to determine the effects ofthe domains on activity. For example, the wild-type enzyme can be wassubcloned with: 1) the catalytic domain alone (CD); 2) the catalytic andcarbohydrate domain (CCD); and 3) the catalytic and carbohydrate bindingdomain plus downstream amino acids (CCD+DS). The full-length protein andthe variants can be assayed on AVICEL® Microcrystalline Cellulose (MCC)and the reaction products analyzed by the BCA reducing sugar assay.

The goal of the GSSM screen was to identify mutants that increased theextent of hydrolysis on insoluble microcrystalline cellulose. A roboticscreening method was developed to facilitate the GSSM screening process.

DNA from mutation constructs can be transformed into DH10b host cells.Individual colonies can be picked into 96 well (shallow) platescontaining 150 uL LB/Ampicillin using the automatic colony pickingsystem. The plates can be incubated for 24 hours at 37° C., 400 rpm. 15uL of culture is then transferred from each well into an inductionplate. Each well of the induction plate contained 135 uL LB/Ampicillinwith 1.1 mM IPTG. The induction plates can be incubated for 24 hours at37° C., 400 rpm. The plates can be centrifuged and the supernatantdiscarded.

The automated portion of the assay can be used at this point. The cellscan be lysed and resuspended by the robot. 150 uL of lysis buffer (125uL water plus 25 uL BPER containing 0.2 mg/ml lysozyme and 20 unit/mlDNase I) can be added to each well. 15 uL lysate is then transferredfrom each well to a reaction plate. Each well of the reaction plate cancontain 185 uL of a reaction mix (1% AVICEL® Microcrystalline Cellulose(MCC), 50 mM sodium acetate buffer pH5.0). The reaction plates can beincubated at 37° C. for 30 hours with 95% humidity. After incubation,the plates can be centrifuged and 15 uL supernatant transferred to BCAplates. The BCA plates can contain 50 uL reagent A, 50 uL reagent B, and80 uL 400 mM Carbonate buffer, pH 10 per well. The plates can be coveredwith rubber seals and incubated at 80° C. for 30 minutes, then cooled bycentrifugation and the absorbance read at A560.

Primary hits can be reconfirmed in a secondary assay. This assay can bethe same as the primary screen. Hits from the secondary screen can befurther analyzed. The GSSM upmutants can be mapped onto the crystalstructure of known enzymes of the same class. Samples can be prioritizedbased on amino acid location, amino acid change and the fold improvementscore. Upmutants can then be selected from the GSSM screening andselected for gene reassembly evolution, i.e., Tunable GeneReassembly(TGR), discussed above, and also see, e.g., U.S. Pat. No. 6,537,776.

Blending of Upmutants

Using gene reassembly (Tunable GeneReassembly (TGR)) technology, GSSMupmutants can be blended in order to identify the candidate with thebest activity. Activity assays can be the same as for the GSSM screeningexcept reactions can be further diluted to account for increasedactivity of upmutants over the wildtype enzyme.

Example 5 Enzyme Mixtures, or “Cocktails” for Processing/ConvertingBiomass

This example describes the development of enzyme mixtures, or“cocktails”, to digest biomass, including ammonia-pretreated biomass,into fermentable sugars. In one aspect, the enzyme mixtures, or“cocktails” comprise at least one exemplary enzyme of the invention.

The enzyme mixtures, or “cocktails”, of the invention are used tohydrolyze cellulose or any β1,4-linked glucose moieties and/orhemicellulose or any branched polymer comprising a β-1,4-linked xylosebackbone with branches of arabinose, galactose, mannose, glucuronicacid, and/or linkages to lignin, e.g., via ferulic acid ester groups.Thus, in various aspects, the methods and compositions of the inventionaddress the complexity and problems of digestion of hemicellulose tomonomer sugars due to the variability of sugars and linkages.

Prior attempts to develop enzymes for the biorefinery have not beensuccessful for 2 main reasons. First, current enzyme usage rates arevery high (approx. 100 g enzyme/gal ethanol) resulting in highproduction costs. There have been minimal efforts to improve theperformance of the cellulase enzymes. In general the focus has been toreduce enzyme cost by increasing production yields in Trichoderma reeseithus improving fermentation economics. However, in order for thebiorefinery to become commercial, enzyme usage rates must bedramatically lower. Second, non-commercial thermochemical pretreatmentconditions were used on the corn stover feedstock therefore minimaleffort has been expended on enzymatic digestion of hemicellulose. Wehave observed and it has been reported that effective digestion of thehemicellulose component improves the rate of cellulose digestion.

In one aspect, an enzyme mixture, or cocktail, comprising (consistingof) nine specific proteins (enzymes) has been developed to providemaximal cellulose and hemicellulose digestion of ammonia pretreatedbiomass, for example, in this exemplary process, for the conversion ofammonia pretreated corn cob. The cocktail contains (comprises) a minimalset of enzymes (see below) active on cellulose, i.e., an endoglucanase,a cellobiohydrolase I, a cellobiohydrolase II and a β-glucosidase; andfive (5) enzymes active on hemicellulose, i.e., a xylanase GH11, axylanase GH10, a β-xylosidase, an arabinofuranosidase GH51 and anarabinofuranosidase GH62. This cocktail was developed by screeningenzyme libraries for the ability to release soluble reducing sugarsindividually and in concert with each other. The table below details theenzymes, their specific class and the usage in a typical hydrolysisexperiment.

Usage Enzyme Class (mg/g cellulose) EG1_CDCBM3 Endoglucanase 1.7 SEQ IDNO: 98 Cellobiohydrolase II 1 (encoded by, e.g., SEQ ID NO: 97) SEQ IDNO: 34 Cellobiohydrolase I 10 (encoded by, e.g., SEQ ID NO: 33) SEQ IDNO: 94 β-glucosidase 2.6 (encoded by, e.g., SEQ ID NO: 93) SEQ ID NO:100 Endoxylanase GH11 0.6 (encoded by, e.g., SEQ ID NO: 99) SEQ ID NO:102 Endoxylanase GH10 0.2 (encoded by, e.g., SEQ ID NO: 101) SEQ ID NO:96 β-xylosidase 0.5 (encoded by, e.g., SEQ ID NO: 95) SEQ ID NO: 92Arabinofuranosidase 0.3 (encoded by, e.g., SEQ ID NO: 91) SEQ ID NO: 104Arabinofuranosidase 2.0 (encoded by, e.g., GH62 SEQ ID NO: 103)

A representative progress curve is shown in FIG. 16 using the recipeshown in the table above (e.g., at the usage amounts of enzyme in mg(enzyme)/g cellulose) at 5% solids pretreated corn cob (designated“Jaygo 2” in the figure) at pH 5.5 and 50° C.

This enzymatic mixture, or “cocktail”, of hemicellulose- andcellulose-hydrolyzing enzymes can be used with or as a substitution forcommercial cellulase preparations, e.g., those derived from crude fungalculture broths, such as Trichoderma reesei.

The development of the cocktails of the invention started with thediscovery of an organism that was observed to grow and digest cellulosicmaterials. Over the past several decades classical strain developmentwas used to optimize these strains as cellulase producers, in generalthis resulted in hypersecretor strains without specifically improvingthe enzymes themselves. These mixtures contain some redundant andunnecessary proteins, and are deficient in many enzyme activities thatare required to digest alkaline pretreated biomass. In general, fungalculture broths have been optimized and/or tailored to acid-treatedbiomass which no longer contains polymeric hemicellulose, hence thepreparations are enriched in cellulase activity while deficient inhemicellulase activity.

In contrast, in one aspect, methods of the invention use a pretreatmentprocess based on dilute ammonia to hydrolyze ester linkages but notglycosidic linkages, thereby resulting in intact insolublehemicellulose. These exemplary enzyme cocktails of the inventiontherefore comprise both cellulase and hemicellulase activities tocompletely release glucose, xylose and arabinose. For example, in oneaspect, enzyme cocktails of the invention can digest both cellulose andthe hemicellulose component of ammonia-pretreated corn cob and stover(corn stover is the residue that is left behind after corn grainharvest). Removal of the hemicellulose “sheath” surrounding thecellulose fibers by the enzyme “cocktails” of the invention will enhancecellulase activity and improve overall performance.

In one aspect, the enzyme mixtures of the invention are used inbiorefineries at enzyme usage rates that are lower than theapproximately 100 g enzyme/gal ethanol now used in bioethanol productionprocesses; thus resulting in lower production costs. These compositionsand enzymes of the invention can be used to reduce enzyme costs inbiorefineries and improve fermentation economics.

In another aspect, the compositions and methods of the invention areused in conjunction with non-commercial thermochemical pretreatmentconditions, e.g., to treat biomass such as corn stover feedstock. Theincorporation of the enzymatic digestion of hemicellulose in practicingthe compositions and methods of the invention makes processes usingthese compositions and methods particularly efficient. The effectivedigestion of a hemicellulose component in a biomass, e.g., a plantbiomass, can improve the rate of cellulose digestion.

In developing this aspect of the invention, high throughput screens wereestablished to survey glycosyl hydrolases for effectiveness ofhydrolysis of model substrates, e.g., AVICEL® microcrystalline celluloseand alkaline pretreated corn stover. In addition, enzymes in relativelyunder-represented enzyme classes such as the cellobiohydrolases andβ-glucosidases were investigated. Cellobiohydrolase discovery focused onsequence-based (hybridization) methods and fungal gene libraries whileβ-glucosidase discovery focused on activity-based methods and bacterialgene libraries. Once individual enzymes were found to be efficacious onpretreated biomass combinations of enzymes were tested for enhanced orsynergistic performance. This included enzymes that hydrolyze celluloseand hemicellulose.

In developing this aspect of the invention, it was recognized thatremoval of the hemicellulose “sheath” surrounding cellulose had apositive effect on cellulose hydrolysis. The endo-xylanases cleavedinsoluble hemicellulose into soluble oligosaccharides that effectivelyremoved the barrier to cellulose. Additionally recalcitrantoligosaccharides were characterized to determine their composition suchthat enzymes with the appropriate specificity could be found which wouldconvert the oligosaccharides into monomer sugar(s). Hence the enzymecocktails of the invention are tailored to the feedstock and thepretreatment chemistry. A minimal set of enzymes has been designed toattack specifically the linkages present in the pretreated biomass.

In one aspect, the enzyme cocktails of the invention are specificallytailored to a pretreatment chemistry to provide an optimal solution toenzymatic digestion. In one aspect, the enzyme mixtures, or cocktails,of the invention are advantageous in that they have no redundant proteinenzymes. Additionally, all cellulose linkages and sugars are addressed,in contrast to natural enzyme mixtures found in a native system whichwere not developed or modified to work on a non-natural substrate.

Another aspect of the invention comprises an enzyme cocktail (designated“E9”) that efficiently hydrolyses cellulose and hemicellulose frombiomass.

Protein in cocktail Enzyme (mg/ml) SEQ ID NO: 106 1 (encoded by, e.g.,SEQ ID NO: 105) SEQ ID NO: 264 0.6 (encoded by, e.g., SEQ ID NO: 263)CBH I 0.05 CBH II 0.05 SEQ ID NO: 100 1 (encoded by, e.g., SEQ ID NO:99) SEQ ID NO: 96 1 (encoded by, e.g., SEQ ID NO: 95) SEQ ID NO: 92 1(encoded by, e.g., SEQ ID NO: 91) SEQ ID NO: 440 1 (encoded by, e.g.,SEQ ID NO: 439) SEQ ID NO: 442 1 (encoded by, e.g., SEQ ID NO: 441) 1.This cocktail of 9 enzymes releases 78% and 62% of the theoreticalglucose and xylose, respectively, in 48 hrs from an alkaline pretreatedcorn cob sample. Futhermore, this 9 enzyme cocktail of the inventionoutperforms an industrial standard - Genencor's SPEZYME ® cellulase(Genencor International, Inc., Palo Alto, CA) under the same reactionconditions.

The strategy employed here to develop an efficient enzyme cocktail wasto screen individual glycosyl hydrolases and non-glycosyl hydrolases onprocess relevant substrates and then combine them to effect maximalmonomer yield. Approximately 150 endoglucanases were screened oncrystalline cellulose (AVICEL® Microcrystalline Cellulose (MCC)) and avariety of pretreated corn stover (PCS) samples. The pretreated samplesincluded dilute acid pretreated corn stover (PCS), steam PCS, “highseverity” alkaline PCS, “low severity” alkaline PCS, “medium severity”alkaline PCS and alkaline soaked pretreated cobs. Products were analyzedby either a general reducing sugar assay (BCA) or by directchromatographic detection (e.g., HPLC with a refractive index detector).An example of results from such a screen is shown in FIG. 21. FIG. 21illustrates in graphic form data showing the release of glucose at 48 hfrom pretreated corn stover (cob) samples by 20 differentendoglucanases.

These experiments identified the exemplary SEQ ID NO:106 (encoded by,e.g., SEQ ID NO:105) as a particularly efficient endoglucanase (see alsoExample 7, below). Eventually close to 100 enzymes were evaluated aspotential components of this exemplary cocktail of the invention. Theseenzymes were assayed on dye-labeled glucopyranoside at variousconditions to determine their optimal pHs and temperatures, asillustrated in FIG. 22. FIG. 22 illustrates in graphic form data showingtemperature and pH optima of 76 β-glucosidases onp-nitrophenyl-β-glucopyranoside.

Three enzymes, SEQ ID NO:264 (encoded by, e.g., SEQ ID NO:263), SEQ IDNO:94 (encoded by, e.g., SEQ ID NO:93) and SEQ ID NO:388 (encoded by,e.g., SEQ ID NO:387), had the highest specific activity. However,additional experiments showed that SEQ ID NO:264 (encoded by, e.g., SEQID NO:263) had the most activity on the substrate cellobiose. SEQ IDNO:264 (encoded by, e.g., SEQ ID NO:263) showed optimal activity at pH 5and 80° C. and was chosen as the top candidate β-glucosidase for thisexemplary cocktail of the invention.

Cellobiohydrolases (CBHs) are well known in the literature and severalare commercially available; two such enzymes CBH I (a family 7 glycosylhydrolase) and CBH II (a family 6 glycosyl hydrolase) were chosen to beincluded in this exemplary cocktail of the invention. Fungal CBH I andCBH II (both from Trichoderma longibrachiatum) were purchased directlyfrom the enzyme supplier Megazyme International (Bray, Ireland) (catalognumbers E-CBHI and E_CBHII) and included in the cocktails at thestandard concentration of 0.05 mg/mL. The addition of thecellobiohydrolases to the cocktail greatly enhanced the overall releaseof glucose from pretreated corn stover (PCS).

Two xylanases, SEQ ID NO:100 (encoded by, e.g., SEQ ID NO:99) and SEQ IDNO:444 (encoded by, e.g., SEQ ID NO:443), perform well on soluble andinsoluble substrates. Both enzymes are family 11 glycosyl hydrolases.SEQ ID NO:444 (encoded by, e.g., SEQ ID NO:443) released close to 50% ofthe available xylose from high severity alkaline pretreated corn stover(PCS) (see FIG. 23) at pH 5 and 50° C. Increased enzyme dose increasesthe rate of hydrolysis with little effect on extent. Furthermore it wasdiscovered that SEQ ID NO:100 (encoded by, e.g., SEQ ID NO:99) and SEQID NO:444 (encoded by, e.g., SEQ ID NO:443) gave almost equivalentperformance. FIG. 23 illustrates in graphic form data showing thedigestion of high severity alkaline PCS (2.2% solids) by 3 differentenzyme loads of the exemplary xylanase SEQ ID NO:444 (encoded by, e.g.,SEQ ID NO:443).

The residual solids isolated after SEQ ID NO:444 (encoded by, e.g., SEQID NO:443) digestion were used to screen approx. 250 xylanases. At leastsix of these screened xylanases were able to further digest thismaterial. All six were family 10 glycosyl hydrolases (see, e.g.,Charnock (1998) J. Biol. Chem. 273:32187-32199). These enzymes weresubjected to detailed analysis and it was determined that SEQ ID NO:102(encoded by, e.g., SEQ ID NO:101) and SEQ ID NO:448 (encoded by, e.g.,SEQ ID NO:447) were the best performers, at least in this assay.

Analysis of the xylanase generated reaction products showed accumulationof soluble xylooligosaccharides (many xylobiose); therefore severalβ-xylosidases were screened at a variety of temperatures and pHs, andresults of which are illustrated in FIG. 24. FIG. 24 illustrates datafrom the hydrolysis of xylobiose by eight xylosidases at either 50° C.or 37° C.

The exemplary β-xylosidase SEQ ID NO:96 (encoded by, e.g., SEQ ID NO:95)was active and stable at 50° C., therefore was chosen to complement thexylanase. Many other hemicellulases were screened that enhanced therelease of xylose from alkaline pretreated corn stover (PCS). As isevident by the results of the data illustrated in FIG. 25, the exemplaryarabinofuranosidase SEQ ID NO:92 (encoded by, e.g., SEQ ID NO:91) notonly releases arabinose from PCS but enhances the activity of xylanaseand xylosidase thereby releasing more xylose. FIG. 25 illustrates datashowing the release of xylose and arabinose from high severity alkalinePCS (2.2% solids) by combinations of xylanase, xylosidase andarabinofuranosidase.

The presence of a ferulic acid esterase (FAE) and an α-glucuronidase(aGlucUr) also contributed to enhanced xylose and arabinose release. Theexemplary enzymes SEQ ID NO:440 (encoded by, e.g., SEQ ID NO:439) (FAE)and SEQ ID NO:442 (encoded by, e.g., SEQ ID NO:441) (aGlucUr) wereidentified as the top candidates in this particular application.

The performance of combinations of all the above mentioned enzymes wastested on alkaline pretreated corn stover (PCS) under standard assayconditions (pH 5, 50° C.). Monomer sugar concentrations were measured byHPLC analysis at regular time points during the reaction (6, 20, 30 and48 hrs). Sugar concentrations were converted into percent conversionbased on composition data. FIG. 26 compares the performance of severalcombinations. Combinations E2 through E9 were tested on “low severityalk PCS” whereas E9.1 is the nine enzyme cocktail tested on alkalinepretreated cobs (“Jaygo 1”). The E9 mixture was made up from crude cellfree extracts and was composed of the enzyme cocktail designated “E9”,as discussed above and listed in FIG. 26. FIG. 26 lists the enzymecocktails of the invention designated E2, E4.1, E4.2, E6, E7 and E9.FIG. 26 also illustrates data showing the performance of the E2, E4.1,E4.2, E6, E7 and E9 enzyme cocktails of the invention on low severityalkPCS and alkaline pretreated cobs (2.2% solids). Percent conversionwas calculated based on determination of sugar monomer in the solublefraction and compositional analysis.

The industrial enzyme standard in biomass conversion is SPEZYME®cellulase (also discussed above). FIG. 27 in table form compares datafrom SPEZYME® cellulase the exemplary enzyme cocktail of the inventiondesignated E9 on four different pretreated corn samples. FIG. 27illustrates a Table comparing the performance of the exemplary enzymecocktail E9 to a typical loading (15 Filter Paper Units per gram ofcellulase) of SPEZYME® cellulase on a variety of pretreated corn samples(the designations “Jaygo 1”, “Jaygo 2” and “Jaygo 4” are cobs and “Jaygo5” is stover). Sugar concentrations were determined after 48 hrincubation. Clearly the exemplary enzyme cocktail E9 releases morexylose than SPEZYME® cellulase and in most cases releases more glucose.

Enzyme cocktails of the invention also were developed and optimized forperformance (e.g., complete hydrolysis of) certain substrates (e.g.,lignocellulosic materials) and subsequent yields of glucose and xylose.In one aspect, enzyme cocktails of the invention when incubated with anappropriate pretreated biomass feedstock have the following performancecharacteristics: in 48 hrs release 75% and 40% of theoretical glucoseand xylose, respectively, using 5% solids and 20 mg/g cellulose.

Several different classes of enzymes were combined in appropriateratios. These cocktails are referred to as “EX” where “X” is the numberof enzymes combined. Performance was monitored by reacting the variousenzyme cocktails with a pretreated biomass sample and measuring sugarsin the liquid phase. In addition, since crude cell free extracts wereused in the cocktails it was necessary to develop methodologies toaccurately assess the amount of active enzyme present. To this end eachenzyme was purified and specific activity of the pure (or enriched)protein was used to estimate the level of active protein in the crudemixture. The table below details performance levels achieved.

Benchmark Performance SPEZYME ® Parameters enzyme* Case 1¹ Case 2¹ Case3¹ Mg active enzyme/g  20³   18.4   19.2   17.2 cellulose Glucose: 80 7679 76 % Conversion Glucose: Time for 48 48 48 48 conversion (hr) Xylose:65  57²  58²  59² % Conversion Xylose: Time for 48 <20   <20   <20  conversion (hr) % Solids   2.5  5  5  5 *Performance of SPEZYME ® enzyme(15 FPU) on corn stover receiving the ‘severe’ alkaline pretreatment(15% NH₄OH, 170° C., 5 minute residence time) followed by disc-refining(0.010″ gap). ** At this point in time we are not monitoring expressionlevels nor are we attempting to improve specific activity. A moreaccurate active enzyme amount will be set after year 2 for subsequentyears. ¹Case 1, 2 and 3 are different enzyme combinations. They areexplained in the text below. ²Yields of xylose reached at the 20 hrtime-point. Xylose concentration increases to approximately 63% between20 and 48 hrs. More data is available in the text below. ³Currentestimates for 15 FPU Spezyme ® cellulase corresponds to approximately 58mg protein.

Protein Purification

List of enzymes purified:

-   -   SEQ ID NO:264 (encoded by, e.g., SEQ ID NO:263): β-glucosidase    -   SEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105): endoglucanase    -   SEQ ID NO:100 (encoded by, e.g., SEQ ID NO:99): family 11        xylanase    -   SEQ ID NO:102 (encoded by, e.g., SEQ ID NO:101): family 10        xylanase    -   SEQ ID NO:96 (encoded by, e.g., SEQ ID NO:95): β-xylosidase    -   SEQ ID NO:92 (encoded by, e.g., SEQ ID NO:91):        α-arabinofuranosidase    -   SEQ ID NO:98 (encoded by, e.g., SEQ ID NO:97): family 6        cellobiohydrolase    -   SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33): family 7        cellobiohydrolase

SEQ ID NO:264—β-Glucosidase

β-glucosidase activity of the exemplary enzyme SEQ ID NO:264 (encodedby, e.g., SEQ ID NO:263) was assayed using the colorimetric substrateanalog pNP-β-glucopyranoside. Activity was measured by monitoringabsorption at 405 nm. The enzyme was purified using anion exchangechromatography resulting in an enrichment of activity from 20.8 U/mgprotein to 179 U/mg protein (almost a 9-fold enrichment). SDS-PAGE anddensitometry showed that the enriched protein was approximately 48%pure, as illustrated in FIG. 33. Using these two values it was estimatedthat the original sample contained approximately 5.6% active SEQ IDNO:264 (encoded by, e.g., SEQ ID NO:263). FIG. 33 illustrates an SDSPAGE of the crude cell extract (“Load”) and the enriched proteinfollowing anion exchange chromatography (“Enriched”). Proteomicsanalysis showed that the bands labeled “dimer” and “trimer” are also theexemplary SEQ ID NO:264 (encoded by, e.g., SEQ ID NO:263) and wereprobably a result of anomalous behavior on SDS-PAGE.

Endoglucanase activity of the exemplary SEQ ID NO:106 (encoded by, e.g.,SEQ ID NO:105) also was assayed using the surrogate substrate4-methylumbeliferyl cellobioside. The enzyme was purified using anionexchange chromatography and heat treatment resulting in an enrichment ofactivity from 8.4×10⁴ U/mg protein to 3×10⁵ U/mg protein (a 3.6-foldenrichment). SDS-PAGE and densitometry showed that the enriched proteinwas approximately 10% pure, as illustrated in FIG. 34. Using these twovalues it was estimated that the original sample contained approximately3% active SEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105). FIG. 34 is anillustration of an SDS. PAGE of the crude cell extract (“Load”) and theenriched exemplary SEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105)following anion exchange chromatography (“Enriched”).

Xylanase activity of the exemplary SEQ ID NO:100 (encoded by, e.g., SEQID NO:99), characterized as a Family 11 xylanase, was assayed on wheatarabinoxylan with product detection using a reducing sugar assay (BCA).The enzyme was purified using cation exchange chromatography resultingin an enrichment of activity from 41.4 U/mg protein to 215 U/mg protein(a 5-fold enrichment). SDS-PAGE and densitometry showed that theenriched protein was approximately 80% pure, as illustrated in FIG. 35.Using these two values it was estimated that the original samplecontained approximately 15.5% active SEQ ID NO:100 (encoded by, e.g.,SEQ ID NO:99). FIG. 35 is an illustration of an SDS PAGE of the crudecell extract (“Load”) and the enriched exemplary SEQ ID NO:100 (encodedby, e.g., SEQ ID NO:99) protein following cation exchange chromatography(“Enriched”).

Xylanase activity of the exemplary SEQ ID NO:102 (encoded by, e.g., SEQID NO:101), characterized as a Family 10 xylanase, was assayed on wheatarabinoxylan with product detection using a reducing sugar assay (BCA).The enzyme was purified using size exclusion chromatography resulting inan enrichment of activity from 10.5 U/mg protein to 42.7 U/mg protein (a4-fold enrichment). SDS-PAGE and densitometry showed that the enrichedprotein was approximately 81.5% pure, as illustrated in FIG. 36. Usingthese two values it was estimated that the original sample containedapproximately 18.7% active SEQ ID NO:102 (encoded by, e.g., SEQ IDNO:101). FIG. 36 is an illustration of an SDS PAGE of the crude cellextract (“Load”) and the enriched exemplary SEQ ID NO:102 (encoded by,e.g., SEQ ID NO:101) protein following size exclusion chromatography(“Enriched”).

β-xylosidase activity of the exemplary SEQ ID NO:96 (encoded by, e.g.,SEQ ID NO:95) was assayed using the fluorimetric substrate analog4-methyl umbelliferyl-β-xylopyranoside. The enzyme was purified usinganion exchange chromatography resulting in an enrichment of activityfrom 6 U/mg protein to 51.2 U/mg protein (an 8.5-fold enrichment).SDS-PAGE and densitometry showed that the enriched protein wasapproximately 21% pure, as illustrated in FIG. 37. Using these twovalues it was estimated that the original sample contained approximately2.7% active exemplary SEQ ID NO:96 (encoded by, e.g., SEQ ID NO:95).FIG. 37 is an illustration of an SDS PAGE of the crude cell extract(“Load”) and the enriched exemplary SEQ ID NO:96 (encoded by, e.g., SEQID NO:95) protein having β-xylosidase activity following anion exchangechromatography (“Enriched”).

Arabinofuranosidase activity of the exemplary SEQ ID NO:92 (encoded by,e.g., SEQ ID NO:91) was assayed using the fluorimetric substrate analog4-methyl umbelliferyl-α-arabinofuranoside. The enzyme was purified usinganion exchange chromatography resulting in an enrichment of activityfrom 7.2×10⁶ U/mg protein to 9.8×10⁶ U/mg protein (a 1.4-foldenrichment). SDS-PAGE and densitometry showed that the enriched proteinwas approximately 50% pure, as illustrated in FIG. 38. Using these twovalues it was estimated that the original sample contained approximately34% active SEQ ID NO:92 (encoded by, e.g., SEQ ID NO:91). FIG. 38 is anillustration of an SDS PAGE of the crude cell extract (“Load”) and theenriched exemplary SEQ ID NO:92 (encoded by, e.g., SEQ ID NO:91) proteinhaving arabinofuranosidase activity following anion exchangechromatography (“Enriched”).

Cellobiohydrolase activity of the exemplary SEQ ID NO:98 (encoded by,e.g., SEQ ID NO:97), a family 6 cellobiohydrolase, was assayed onphosphoric acid swollen cellulose (PASC). Product was detected by acoupled assay with β-glucosidase, glucose oxidase and horseradishperoxidase. The enzyme was purified from the secreted protein of aCochliobolus heterostrophus strain containing the CBH gene inserted intothe chromosome. An affinity ligand, p-aminophenyl-β-cellobioside, wasdeveloped to isolate the protein from endogenous protein. SDS-PAGE anddensitometry showed that the enriched protein was approximately 46%pure, as illustrated in FIG. 39. FIG. 39 is an illustration of anSDS-PAGE of the exemplary SEQ ID NO:98 (encoded by, e.g., SEQ ID NO:97)having cellobiohydrolase activity enriched on a PAPC affinity ligand.

Unlike the family 6 cellobiohydrolases, family 7 cellobiohydrolaseenzymes are active on the dye labeled substrate analog 4-methylumbelliferyl-β-lactoside, therefore the exemplary SEQ ID NO:34 (encodedby, e.g., SEQ ID NO:33) was assayed on this substrate. The enzyme waspurified from the secreted protein of a Cochliobolus heterostrophusstrain containing the CBH gene inserted into the chromosome. Sizeexclusion chromatography was used to separate SEQ ID NO:34 (encoded by,e.g., SEQ ID NO:33) from endogenous proteins. SDS-PAGE and densitometryshowed that the enriched protein was approximately 20% pure, asillustrated in FIG. 40. FIG. 40 is an illustration of an SDS-PAGE of theexemplary family 7 cellobiohydrolase SEQ ID NO:34 (encoded by, e.g., SEQID NO:33) enriched on size exclusion chromatography.

Trichoderma reesei CBH I and II: Megazyme International (Bray, Ireland)sells preparations of T. reesei I and II. SDS-PAGE was used to estimatethe level of purity for each of these enzymes. Prior to use they weredialyzed to remove ammonium sulfate present in the preparations. TheTable below summarizes the results of purification of each enzyme andthe approximate percent active enzyme used in the cocktails. The tablebelow lists an exemplary enzyme cocktail used as biomass-degradingcocktails, and the table lists estimated percent of active enzyme incrude preparations.

Enzyme % active enzyme SEQ ID NO: 264 (encoded by, e.g., SEQ ID NO: 263)5.6 SEQ ID NO: 106 (encoded by, e.g., SEQ ID NO: 105) 3 SEQ ID NO: 100(encoded by, e.g., SEQ ID NO: 99) 15.5 SEQ ID NO: 102 (encoded by, e.g.,SEQ ID NO: 101) 18.7 SEQ ID NO: 96 (encoded by, e.g., SEQ ID NO: 95) 2.7SEQ ID NO: 92 (encoded by, e.g., SEQ ID NO: 91) 34 SEQ ID NO: 98(encoded by, e.g., SEQ ID NO: 97) 46 SEQ ID NO: 34 (encoded by, e.g.,SEQ ID NO: 33) 20 Tr CBH I 87 Tr CBH II 51

Enzymatic Digestion of Pretreated Biomass:

In all the studies shown below the pretreated biomass sample wasdesignated “Jaygo 2” (5% solids pretreated corn cob). The composition of“Jaygo 2” was determined and is shown in the Table, below. These valueswere used to calculate percent conversion during enzymatic hydrolysis.The table lists the composition of “Jaygo 2” and theoreticalconcentration of glucose and xylose after 100% conversion of 5% solidsreaction

Theoretical 100% Percent Ratio conversion (5% solids) Composition(liquid/solid) Total (g/L) (mM) Glucan 42.9 0.010 43.33 21.67 120.37Xylan 31.22 0.084 33.85 16.93 112.84

Each reaction was sampled at various time points and productconcentration was determined by HPLC-RI. An example of a chromatogram isshown in an illustration shown as FIG. 41; a representative HPLC traceof the products of biomass digestion. Production detection was byrefractive index. In FIG. 41 G1 is glucose and X1 is xylose.

In order to directly compare activity performance of exemplary enzymesof the invention to a commercial benchmark, SPEZYME® was used in thesestudies, cellulase performance was tested under the same conditions. Thestandard dosage of SPEZYME®cellulase of 15 FPU/g cellulose is equivalentto 58 mg protein/g cellulose. In the following experiments 7.5 FPUcellulase (29 mg) was combined with the protein equivalent of MULTIFECT®xylanase for a total of 58 mg/g cellulose. FIG. 42 graphicallyillustrates data obtained from cellulase digestion using exemplaryenzymes of the invention using 5% solids (Jaygo 2) (5% solids pretreatedcorn cob) in both absolute concentration and percent conversion. In theassay shown in FIG. 42, digestion of Jaygo 2 (5% solids) using 7.5 FPU/gcellulose SPEZYME® cellulase plus 7.5 “FPU equivalents”/g celluloseMULTIFECT® xylanase (in total 58 mg/g cellulose). Percent conversion wasbased on 120 mM glucose and 113 mM xylose as 100%. FIG. 43 shows thedata set for 10% solids. FIG. 43 graphically illustrates data from thedigestion of Jaygo 2 (10% solids) using 7.5 FPU/g cellulose SPEZYME®cellulase plus 7.5 “FPU equivalents”/g cellulose MULTIFECT® xylanase (intotal 58 mg/g cellulose). Percent conversion was based on 240 mM glucoseand 226 mM xylose as 100%.

Therefore benchmark performance is:

Performance Benchmark Benchmark Parameters SPEZYME ® enzyme SPEZYME ®enzyme mg active enzyme/g 58 58 cellulose Glucose: 75 73 % ConversionGlucose: Time for 48 48 conversion (hr) Xylose: 59 57 % ConversionXylose: Time for 48 48 conversion (hr) % Solids 5 10

Enzymes of the Invention—5% Solids

A cocktail of 10 enzymes (so-called the exemplary “E10” cocktail) whichshowed very high biomass saccharification activity was developed. Fourof the enzymes are responsible for digesting cellulose while theremainder are active on hemicellulose. As described above, a combinationof protein purification, SDS-PAGE analysis and enzyme assays allowed aquantitative measure of the amount of active enzyme in each of the crudepreparations. In order to reduce overall protein used insaccharification reactions, a systematic approach was undertaken toremove redundant and unnecessary enzymes from the E10 cocktail. It wasdetermined that 2 of the enzymes, SEQ ID NO:442 (encoded by, e.g., SEQID NO:441) (an α-glucuronidase) and SEQ ID NO:440 (encoded by, e.g., SEQID NO:439) (a ferulic acid esterase) contributed very little to overallperformance and were removed from the cocktail, resulting in an E8mixture. Finally, experiments were carried out to determine which of thecellobiohydrolases (CBH I, CBH II, SEQ ID NO:98 (encoded by, e.g., SEQID NO:97) and SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33)) were themost effective. The performance from three different mixes was assessed.The composition of each of these mixes is shown in the tables below(Case 1, 2 and 3). The tables show how much of each enzyme was used inthe cocktails and estimates of active enzyme in each of thesepreparations (expressed in mg enzyme/g cellulose). In all three casesthe total enzyme composition was tabulated and was below the 20 mg/gcellulose limited outlined in the target (case 1=18.4 mg/g; case 2=19.2mg/g and case 3=17.2 mg/g).

Case 1 (CBH I/CBH II) E8 Cocktail Total enzyme Pure enzyme % of powderEnzymes substrate Glucan Total g (mg/g Enzyme (mg/g Components powdermg/mL mg/mL mg/mL % cellulose cellulose) % cellulose) 1 SEQ ID NO: 106(encoded 0.63 2 1.3 50 0.4333 0.0217 58.2 3.00% 1.7 by SEQ ID NO: 105) 2SEQ ID NO: 264 (encoded 0.67 1.5 1.0 50 0.4333 0.0217 46.4 5.60% 2.6 bySEQ ID NO: 263) 3 CBH I* 1 0.25 0.3 50 0.4333 0.0217 11.5 87.00% 10.0 4CBH II** 1 0.1 0.1 50 0.4333 0.0217 4.6 51.00% 2.4 5 SEQ ID NO: 100(encoded 0.9 0.1 0.1 50 0.4333 0.0217 4.2 15.50% 0.6 by SEQ ID NO: 99) 6SEQ ID NO: 96 (encoded 0.81 0.5 0.4 50 0.4333 0.0217 18.7 2.70% 0.5 bySEQ ID NO: 95) 7 SEQ ID NO: 92 (encoded 0.74 0.025 0.0 50 0.4333 0.02170.9 34.00% 0.3 by SEQ ID NO: 91) 8 SEQ ID NO: 440 (encoded 1 0 0.0 500.4333 0.0217 0.0 5.00% 0.0 by SEQ ID NO: 439) 9 SEQ ID NO: 442 (encoded1 0 0.0 50 0.4333 0.0217 0.0 5.00% 0.0 by SEQ ID NO: 441) 10 SEQ ID NO:102 (encoded 1 0.025 0.0 50 0.4333 0.0217 1.2 18.70% 0.2 by SEQ ID NO:101) TOTAL 18.4

Case 2 (CBH I/SEQ ID NO: 98 (encoded by, e.g., SEQ ID NO: 97)) E8Cocktail Total enzyme Pure enzyme % of powder Enzymes substrate GlucanTotal g (mg/g Enzyme (mg/g Components powder mg/mL mg/mL mg/mL %cellulose cellulose) % cellulose) 1 SEQ ID NO: 106 (encoded 0.63 2 1.350 0.4333 0.0217 58.2 3.00% 1.7 by SEQ ID NO: 105) 2 SEQ ID NO: 264(encoded 0.67 1.5 1.0 50 0.4333 0.0217 46.4 5.60% 2.6 by SEQ ID NO: 263)3 CBH I* 1 0.25 0.3 50 0.4333 0.0217 11.5 87.00% 10.0 4 SEQ ID NO: 98(encoded 1 0.15 0.2 50 0.4333 0.0217 6.9 46.00% 3.2 by SEQ ID NO: 297) 5SEQ ID NO: 100 (encoded 0.9 0.1 0.1 50 0.4333 0.0217 4.2 15.50% 0.6 bySEQ ID NO: 99) 6 SEQ ID NO: 96 (encoded 0.81 0.5 0.4 50 0.4333 0.021718.7 2.70% 0.5 by SEQ ID NO: 95) 7 SEQ ID NO: 92 (encoded 0.74 0.025 0.050 0.4333 0.0217 0.9 34.00% 0.3 by SEQ ID NO: 91) 8 SEQ ID NO: 440(encoded 1 0 0.4333 0.000 5.00% by SEQ ID NO: 439) 9 SEQ ID NO: 442(encoded 1 0 0.4333 0.000 5.00% by SEQ ID NO: 441) 10 SEQ ID NO: 102(encoded 1 0.025 0.0 50 0.4333 0.0217 1.2 18.70% 0.2 by SEQ ID NO: 101)TOTAL 19.2

Case 3 (SEQ ID NO: 34 (encoded by, e.g., SEQ ID NO: 33)/SEQ ID NO: 98(encoded by, e.g., SEQ ID NO: 97)) E8 Cocktail Total enzyme Pure enzyme% of powder Enzymes substrate Glucan Total g (mg/g Enzyme (mg/gComponents powder mg/mL mg/mL mg/mL % cellulose cellulose) % cellulose)1 SEQ ID NO: 106 (encoded 0.63 2 1.260 50 0.4333 0.0217 58.2 3.00% 1.7by SEQ ID NO: 105) 2 SEQ ID NO: 264 (encoded 0.67 1.5 1.005 50 0.43330.0217 46.4 5.60% 2.6 by SEQ ID NO: 263) 3 SEQ ID NO: 34 (encoded 1 0.750.750 50 0.4333 0.0217 34.6 20.00% 6.9 by SEQ ID NO: 33) 4 SEQ ID NO: 98(encoded 1 0.2 0.200 50 0.4333 0.0217 9.2 46.00% 4.2 by SEQ ID NO: 297)5 SEQ ID NO: 100 (encoded 0.9 0.1 0.090 50 0.4333 0.0217 4.2 15.50% 0.6by SEQ ID NO: 99) 6 SEQ ID NO: 96 (encoded 0.81 0.5 0.405 50 0.43330.0217 18.7 2.70% 0.5 by SEQ ID NO: 95) 7 SEQ ID NO: 92 (encoded 0.740.025 0.019 50 0.4333 0.0217 0.9 34.00% 0.3 by SEQ ID NO: 91) 8 SEQ IDNO: 440 (encoded 1 0 0.000 50 0.4333 0.0217 0.0 5.00% 0.0 by SEQ ID NO:439) 9 SEQ ID NO: 442 (encoded 1 0 0.4333 0.000 5.00% by SEQ ID NO: 441)10 SEQ ID NO: 102 (encoded 1 0.025 0.0 50 0.4333 0.0217 1.2 18.70% 0.2by SEQ ID NO: 101) TOTAL 17.2

FIGS. 44 and 45 show the time courses of saccharification of Jaygo 2 (5%solids pretreated corn cob) using the three enzyme mixes. While therewere some minor differences in rates between the cases all threeresulted in almost exactly 80% recovery of glucose and 62% recovery ofxylose within 48 hrs. FIG. 44 data demonstrates that glucose releasefrom Jaygo 2 (5% solids) catalyzed by three different exemplary enzymecocktails of the invention, including: E8 cocktails CBH I/CBH II is Case1 table; CBH I/SEQ ID NO:98 (encoded by, e.g., SEQ ID NO:97) is Case 2table and SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33)/SEQ ID NO:98(encoded by, e.g., SEQ ID NO:97) is the Case 3 table. Glucoseconcentration was determined by HPLC analysis of the saccharifiedliquors sampled at 4, 20, 30 and 48 hrs. Percent conversion wascalculated by using 120 mM as 100% available glucose in the pretreatedsolids. Reaction conditions are pH 5.5 and 50° C. FIG. 45 datademonstrates xylose release from Jaygo 2 (5% solids) catalyzed by thesethree different exemplary enzyme cocktails of the invention, including:CBH I/CBH II is Case 1 table; CBH I/SEQ ID NO:98 (encoded by, e.g., SEQID NO:97) is Case 2 table and SEQ ID NO:34 (encoded by, e.g., SEQ IDNO:33)/SEQ ID NO:98 (encoded by, e.g., SEQ ID NO:97) is the Case 3table. Xylose concentration was determined by HPLC analysis of thesaccharified liquors sampled at 4, 20, 30 and 48 hrs. Percent conversionwas calculated by using 113 mM as 100% available xylose in thepretreated solids. Reaction conditions are pH 5.5 and 50° C.

The performance of the exemplary enzyme cocktail “E8” compared toSPEZYME® cellulase is tabulated below:

Benchmark Performance Spezyme ® Diversa Diversa Diversa Parametersenzyme* Case 1¹ Case 2¹ Case 3¹ mg active enzyme/g 58   18.4   19.2  17.2 cellulose Glucose: 75 76 79 76 % Conversion Glucose: Time for 4848 48 48 conversion (hr) Xylose: 59  57²  58²  59² % Conversion Xylose:Time for 48 <20   <20   <20   conversion (hr) % Solids 5  5  5  5

In summary, the exemplary enzyme cocktail “E8” outperformed SPEZYME®cellulase/MULTIFECT® xylanase (rate and extent) with approximatelyone-third the amount of protein per gram (protein/g) cellulose.

Exemplary Enzymes—Higher Solids Saccharification

Ultimately a biomass process will require solids loadings higher than 5%with low enzyme content. Therefore we set out to evaluate theperformance of the Diversa cocktails at 10% solids and then reduce theamount of protein in the cocktails from 20 mg/g cellulose toapproximately 12 mg/g cellulose. Initial experiments were performed atenzyme loadings similar to the standard SPEZYME® cellulase/MULTIFECT®xylanase mixtures (58 mg protein/g cellulose). These data are shown inFIG. 46. Under these reaction conditions the exemplary E9 cocktailreached 74% and 70% conversion for glucose and xylose, respectively.FIG. 46 data demonstrates the digestion of Jaygo 2 (10% solidspretreated corn cob) using 58 mg “E9 cocktail”/g cellulose. Percentconversion was based on 240 mM glucose and 226 mM xylose as 100%; andthe following table summarizes the performance characteristics of anexemplary E9 cocktail at 58 mg/g cellulose loading and 10% solids:

Performance Parameters exemplary E9 cocktail mg active enzyme/gcellulose 58 Glucose: 74 % Conversion Glucose: Time for 48 conversion(hr) Xylose: 71 % Conversion Xylose: Time for conversion 48 (hr) %Solids 10

The next goal was to decrease protein dosage to approximately 12 mg/gcellulose. Four different recipes for the exemplary enzyme mixturecalled E8 (the “E8 cocktails”) were used, altering the hemicellulase andcellulase ratios. The table below details the recipes of the fourexemplary cocktails and the amount of xylose and glucose released at 36hrs. This table summarizes data showing the performance of fourdifferent exemplary “E8 cocktails” on 10% Jaygo 2 (10% solids pretreatedcorn cob):

G1 - X1 - G1 - X1 - mg/g Conv % 36 hr 36 hr 48 hr 48 hr cellulose 1xE,CBH1(0.25)/SEQ ID 46.7 61.3 49.7 60.7 11.9 NO: 100(0.75) 1xE,CBH1(0.24)/SEQ ID 42.2 60.4 46.7 60.6 12 NO: 100(1) 1xE, CBH1(0.24)/SEQID 50.1 61.4 52.0 60.2 12 NO: 98(0.1)/SEQ ID NO: 100(1) 1xE, SEQ ID NO:34(0.75)/ 50.5 61.5 54.7 61.9 12 SEQ ID NO: 98(0.1)/SEQ ID NO: 100(1)

Cellulose hydrolysis appeared to be sensitive to both the cellulase andhemicellulase concentrations (a synergy between the enzyme types)whereas

-   -   hemicellulose hydrolysis (as measured by xylose release)        appeared to be sensitive only to hemicellulase content. Under        these conditions xylose conversion is maintained at about 60% at        36 hrs while glucose conversion drops to approximately 50% as        compared to performance at a higher enzyme loading.

A systematic study was undertaken in order to clarify the interplaybetween biomass solids content and enzyme loading. Reactions were set upwith 18 mg protein/g cellulose and 9 mg protein/g cellulose at 1%, 5%and 10% Jaygo 2 (pretreated corn cob).

Time courses for glucose, expressed in percent conversion andconcentration, are shown in FIGS. 47 to 50, and time courses for xylose(also expressed in percent conversion and concentration) are shown inFIGS. 51 to 54. Though more sugar is released at the higher solidsloading, the percent conversion decreases. Clearly glucose release wasmuch more sensitive to solids loading than xylose, as a matter of factat the high enzyme load (18 mg/g) there was almost no difference inxylose yield between the different percent solids in the reactor.Possible explanations for the decrease in performance as substrateconcentration increases are (1) product inhibition by glucose, xylose,cellobiose or xylobiose (2) mass transfer (mixing) deficiencies or (3) acombination of both.

FIG. 47 data illustrates the time courses for glucose appearance using18.1 mg of the exemplary enzyme cocktail “E8” per gram cellulose (18.1mg E8 mix/g cellulose) and 1, 5 and 10% solids (Jaygo 2) (pretreatedcorn cob). Percent conversion was based on theoretical glucose yields of240 mM, 120 mM and 24 mM for 10%, 5% and 1% solids, respectively. FIG.48 data illustrates time courses for glucose appearance using 18.1 mg ofthe exemplary E8 enzyme cocktail/g cellulose and 1, 5 and 10% solids(Jaygo 2) (pretreated corn cob). FIG. 49 data illustrates time coursesfor glucose appearance using 9 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids (Jaygo 2) (pretreated corn cob).Percent conversion was based on theoretical glucose yields of 240 mM,120 mM and 24 mM for 10%, 5% and 1% solids, respectively. FIG. 50 dataillustrates time courses for glucose appearance using 9 mg of theexemplary E8 enzyme cocktail/g cellulose and 1, 5 and 10% solids (Jaygo2) (pretreated corn cob). FIG. 51 data illustrates time courses forxylose appearance using 18 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids (Jaygo 2) (pretreated corn cob).Percent conversion was based on theoretical glucose yields of 226 mM,113 mM and 23 mM for 10%, 5% and 1% solids, respectively. FIG. 52 dataillustrates time courses for xylose appearance using 18 mg of theexemplary E8 enzyme cocktail/g cellulose and 1, 5 and 10% solids (Jaygo2) (pretreated corn cob). FIG. 53 data illustrates time courses forxylose appearance using 9 mg of the exemplary E8 enzyme cocktail/gcellulose and 1, 5 and 10% solids (Jaygo 2) (pretreated corn cob).Percent conversion was based on theoretical glucose yields of 226 mM,113 mM and 23 mM for 10%, 5% and 1% solids, respectively. FIG. 54 dataillustrates time courses for xylose appearance using 9 mg of theexemplary E8 enzyme cocktail/g cellulose and 1, 5 and 10% solids (Jaygo2) (pretreated corn cob). FIGS. 55 and 56 in chart form summarize thedata shown in FIGS. 47 to 50 (glucose) and FIGS. 51 to 54 (xylose). ForFIG. 55: percent glucose conversion at 48 hrs using different enzyme(the exemplary E8 enzyme cocktail) and solids (Jaygo 2) loadings; FIG.56: Percent xylose conversion at 48 hrs using different enzyme (theexemplary E8 enzyme cocktail) and solids (Jaygo 2) loadings. These dataare summarized in the table, below:

Summary Table Performance Parameters Performance Measured ExemplaryExemplary parameters Benchmark Benchmark* E8 cocktail E9 cocktail mgactive 20 58 (15 FPU) 12 58 enzyme/g cellulose Glucose: 80 73 50 74 %conversion Glucose: 48 48 36 48 Conversion time (h) Xylose: 65 57 62 70% conversion Xylose: 48 48 36 48 Conversion time (h) % solids 2.5 10 1010 • 7.5 FPU SPEZYME ® cellulase plus 7.5 “FPU equivalents” MULTIFECT ®xylanase

Example 6 Characterization of the Activity of Enzymes of the Invention

This example describes characterizes exemplary enzymes of the invention.For example, this example describes how exemplary enzymes of theinvention can be used as cellulolytic enzymes for the hydrolysis ofbiomass, e.g., plant biomass, such as pretreated corn stover. In oneaspect, exemplary enzymes of the invention are used alone or incombination as endoglucanases, cellobiohydrolases and/or β-glucosidasesfor, e.g., the treatment, e.g., saccharification, of cellulose orcellulose-comprising compositions, such as plant biomass, e.g.,pretreated stover or fiber.

In this example, forty-five (45) enzymes of the invention arecharacterized: 16 are classified as β-glucosidases, 5 are endoglucanasesand 24 are cellobiohydrolases. These enzymes alone or in combination canbe used in the hydrolysis of cellulose in biomass, e.g., plant biomass,such as pretreated corn stover/fiber. Exemplary enzymes of the inventionare listed in the table below, wherein the odd SEQ ID NOs: arenucleotide sequences and the even SEQ ID NOs: are amino acid sequences;for example, to aid in reading the table, SEQ ID NO:2, encoded by, e.g.,SEQ ID NO:1, has cellobiohydrolase activity; SEQ ID NO:4, encoded by,e.g., SEQ ID NO:3, has B-glucosidase activity, etc:

SEQ ID NO: Activity Class 1, 2 Cellobiohydrolase 3, 4 B-glucosidase 5, 6B-glucosidase 7, 8 B-glucosidase  9, 10 Cellobiohydrolase 11, 12Cellobiohydrolase 13, 14 B-glucosidase 15, 16 B-glucosidase 17, 18B-glucosidase 19, 20 Cellobiohydrolase 21, 22 Cellobiohydrolase 23, 24B-glucosidase 25, 26 Cellobiohydrolase 27, 28 Cellobiohydrolase 29, 30B-glucosidase 31, 32 B-glucosidase 33, 34 Cellobiohydrolase 35, 36Cellobiohydrolase 37, 38 Endoglucanase 39, 40 Endoglucanase 41, 42B-glucosidase 43, 44 Cellobiohydrolase 45, 46 Cellobiohydrolase 47, 48Endoglucanase 49, 50 B-glucosidase 51, 52 Cellobiohydrolase 53, 54Cellobiohydrolase 55, 56 Cellobiohydrolase 57, 58 B-glucosidase 59, 60Cellobiohydrolase 61, 62 Endoglucanase 63, 64 Cellobiohydrolase 65, 66Cellobiohydrolase 67, 68 Cellobiohydrolase 69, 70 B-glucosidase 71, 72Cellobiohydrolase 73, 74 Cellobiohydrolase 75, 76 B-glucosidase 77, 78Cellobiohydrolase 79, 80 B-glucosidase 81, 82 Cellobiohydrolase 83, 84Cellobiohydrolase 85, 86 Endoglucanase 87, 88 Cellobiohydrolase 89, 90B-glucosidase

The invention also provides for the manipulation and/or modification ofthe sequences of the invention, including the exemplary sequences notedabove. One of skill in the art would recognize that such modificationcan occur at one or more base pairs, codons, introns, exons, or aminoacid residues, yet still retain the biological (e.g., enzymatic, orsubstrate binding) activity of the enzyme of the invention.

Variants of the polypeptide sequences of the invention may include oneor more amino acid substitutions, e.g., with a conserved ornon-conserved amino acid residue. In one aspect, substituted amino acidresidues may or may not be one encoded by the genetic code. In oneaspect, variants comprising one or more amino acid residues of thepolypeptides of the invention comprise a substituent group orsubstituent groups. Still other variants substituents of the inventioncomprise polypeptides associated with another compound (e.g., a mixture,heteroconjugate or heterodimer, etc.), such as a compound to increasethe half-life of the polypeptide (for example, polyethylene glycol).Additional embodiments comprising enzymes of the invention are those inwhich additional amino acids are fused to the enzyme, or joined as aheteroconjugate recombinant protein, where the heterologous sequencecomprises, e.g., non-enzymatic sequence. The joined or fused additionalamino acids can comprise a leader sequence or a secretory sequence, aproprotein sequence or a sequence which facilitates purification,enrichment, or stabilization of the polypeptide. Methods of makingvariants are familiar to those skilled in the art.

In another aspect, enzymes of the invention have additional or differentenzymatic activities from those noted above. For example, in one aspect,an enzyme of the invention having β-glucosidase activity also can haveβ-mannosidase, or another, activity. In another aspect, an enzyme of theinvention having endoglucanase activity can have exoglucanase activityor another activity.

Example 7 Xylanase for Enhancing Digestion of Cellulose inLignocellulosic Biomass

This example describes and characterizes an exemplary enzyme of theinvention having xylanase activity that enhances the digestion ofcellulose in a lignocellulosic biomass; in particular, a plant biomasscomprising corn stover.

This example describes studies demonstrating the enhancement ofcellulase action on biomass samples containing both cellulose andhemicellulose when a xylanase is present in the reaction mixture. In oneaspect, the xylanase itself has no cellulase activity but enhancescellulolytic activity, e.g., in one aspect, by removing interferinghemicellulose to reveal additional reactive sites on the cellulose.Though the xylanase used here is the exemplary enzyme of the inventionSEQ ID NO:444 (encoded by, e.g., SEQ ID NO:443), any xylanase orhemicellulase can be used to practice the compositions and methods ofthe invention.

The substrate used in these studies comprised an alkaline pretreatedcorn stover (alkPCS) sample. Specifically this corn stover sample wastreated with NH₄OH (5% or 15%) at 140° C. or 170° C. “High”, “low” and“medium” severity refers to the conditions of pretreatment. Highseverity conditions were 15% NH₄OH, 170° C. for 5 minutes; low severitywas 5% NH₄OH and 140° C. and medium severity was at 15% NH₄OH and 140°C. The solids composition of the high severity alkPCS was determined andis shown in Table 1:

TABLE 1 Composition of High Severity alkPCS. Component Weight percentProtein 3 Ash 4.7 Lignin 11.4 Glucan 45.7 Xylan 26.6 Galactan 0.9Arabinan 3.1 Mannan 2 Uronic acids Acetyl Total 97.4

Digestion of this material (2.2% solids) was carried out using 3different concentrations of the endoglucanase SEQ ID NO:106 (encoded by,e.g., SEQ ID NO:105) (in crude E. coli lysate) at pH 5 and 50° C.Product release (cellobiose and glucose) was monitored over time (datasummarized and illustrated in FIG. 17) using an HPLC method. Observedcellobiose was converted into “glucose equivalents” prior to plottingthe data. 100% conversion of all the glucan would result inapproximately 56 mM glucose. FIG. 17 illustrates data showing theglucose release from high severity alkPCS due to the enzymatic activityof the exemplary endoglucanase of the invention SEQ ID NO:106 (encodedby, e.g., SEQ ID NO:105).

The rate of glucose appearance was enzyme-concentration dependent,however the extent of hydrolysis seemed to reach a maximum of between10-14%. Similar results were obtained using the exemplary xylanase SEQID NO:444 (encoded by, e.g., SEQ ID NO:443) (as illustrated in FIG. 18),except that xylose and xylobiose were released and the extent ofdigestion was between 40% and 50% (complete conversion would releaseapprox. 40 mM of xylose). The exemplary xylanase SEQ ID NO:444 (encodedby, e.g., SEQ ID NO:443) did not release any glucose, either monomer orcellobiose. FIG. 18 illustrates in graphic form data demonstratingxylose release from high severity alkaline pretreated corn stover(alkPCS) by the exemplary xylanase SEQ ID NO:444 (encoded by, e.g., SEQID NO:443).

The combination of the exemplary SEQ ID NO:106 (encoded by, e.g., SEQ IDNO:105) and SEQ ID NO:444 (encoded by, e.g., SEQ ID NO:443) resulted ina substantial increase in the rate of glucose release, reaching maximumconversion at a much shorter time, however no increase in extent wasobserved, as illustrated in FIG. 19. There was no improvement in therate of xylose release by the addition of the endoglucanase. FIG. 19illustrates in graphic form data demonstrating digestion of highseverity alkPCS by the exemplary SEQ ID NO:106 (encoded by, e.g., SEQ IDNO:105) and by the exemplary SEQ ID NO:444 (encoded by, e.g., SEQ IDNO:443) and SEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105). Each enzymewas added to 1 mg/ml total protein and the digestion occurred at pH 5and 50° C. using 2.2% solids.

Partial digestion of high severity alkaline pretreated corn stover(alkPCS) is also observed when cellobiohydrolase (CBH) is used, asdemonstrated by data illustrated in FIGS. 20A and 20B. Purified CBH Iand CBH II from Trichoderma reesei were purchased from Megazyme (Co.Wicklow, Ireland) and added to a final concentration of 0.05 mg/mlcontaining 2.2% alkPCS (pH 5 and 50° C.). The exemplary β-glucosidaseSEQ ID NO:264 (encoded by, e.g., SEQ ID NO:263) was also added toconvert cellobiose to glucose. When the exemplary xylanase SEQ ID NO:444(encoded by, e.g., SEQ ID NO:443), 0.4 mg/ml is added to the reactionmix a significant enhancement is observed in both rate and extent ofglucose release, see FIGS. 20A and 20B. Furthermore, when the exemplaryendoglucanase SEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105) (1 mg/ml)is included there is an additional increase in the amount of glucosereleased reaching conversion levels of close to 55% for the CBH Icombination and 30% for the CBH II combination. FIG. 20 illustrates datashowing both rate and extent of glucose release using combinations ofCBH I (A) and CBH II (B) with the exemplary xylanase SEQ ID NO:444(encoded by, e.g., SEQ ID NO:443) and the exemplary endoglucanase SEQ IDNO:106 (encoded by, e.g., SEQ ID NO:105).

Example 8 Degradation of Cellulosic Materials with ThermostableEndoglucanases

This example describes the discovery of and characterization of theactivity of thermostable endoglucanases (EGs) of the invention.

In some aspects, endoglucanases of the invention, and enzymes used topractice the methods of the invention, are capable of digesting (e.g.,hydrolyzing) crystalline cellulose under conditions comprising anywherein the range of between about 60° C. to 80° C. In some aspects,endoglucanases of the invention are useful in high temperature biomasssaccharification processes coupled with fermentation to ethanol. In someaspects, carrying out these biomass saccharification processes atelevated temperatures is advantageous due to increased reaction ratesthat occur as the temperature is raised. Hydrolysis of crystallinecellulose is important since this material typically presents achallenge to enzymatic digestion due to its highly ordered and stablestructure. In addition, crystalline cellulose constitutes a largepercentage of the cellulose in lignocellulosic materials that are usedin a biomass to ethanol process of the invention.

This example describes methods used to identify thermostable EGs of theinvention that possess high hydrolytic activity on cellulosic materialsat elevated temperatures. Thermostable endoglucanases were tested foractivity using microcrystalline cellulose AVICEL® (MCC) because MCC is50% to 60% crystalline; thus, performance on MCC would be predictive ofperformance on pretreated corn stover (PCS). There is literatureevidence that this is true for dilute acid PCS. Although the finalprocess may not demand thermostable enzymes, a high temperaturesaccharification may have improved rates of hydrolysis, thereforedecreasing enzyme loading and/or residence time requirements.

In one aspect, this method applies to protein mixtures that contain anendoglucanase (e.g., clear cell lysates). The specific activity of EGson soluble cellulose substrate carboxylmethyl cellulose (CMC), termed asCMCase activity (μmol min⁻¹ per mg of protein), is determined at 37° C.,pH 7.0; and the data is summarized in the table illustrated in FIG. 28.The hydrolytic activity of EGs on microcrystalline cellulose AVICEL®(MCC), termed “Avicelase activity” at mM reducing sugar released per mgof protein in 24 h, is also determined at various temperatures (e.g.,37° C., 60° C. and 80° C.), and pHs (e.g., pH 5, pH 7 and pH 9). EGswith high “Avicelase activity” at elevated temperatures (e.g., >60° C.)are identified. The ratio of Avicelase activity vs. CMCase units used inthe reactions is calculated and used to further identify EGs thatpossess high specific Avicelase activity.

Exemplary endoglucanases were assayed for their specific activity on CMCand their CMCase activity was obtained. These enzymes were also used tohydrolyze AVICEL® at 37, 60 and 80° C., and pH 5, 7 and 9. In theseassays, 1% of AVICEL® and 1 mg mL⁻¹ of EG-containing cell lysates wereused. The products released (in mM) after 24 h incubation were measuredusing reducing sugar assay, the BCA assay. A commercially availableenzyme (Trichoderma longibrachiatum EG from Megazyme International,Ireland) was used as a benchmark for comparison. A total of twelve EGsexhibited high AVICEL® hydrolytic activity at 60° C. or above. Amongthem; the exemplary enzymes SEQ ID NO:318 (encoded by, e.g., SEQ IDNO:317), 9963, 10061, and SEQ ID NO:314 (encoded by SEQ ID NO:313) hadgreater Avicelase/CMCase ratio than the benchmark EG.

In one aspect, methods comparing thermostable EGs under conditions inwhich amount of enzymes in the reactions is normalized. The thermostableEGs identified from the above-described methods can be further comparedfor their hydrolytic activity on cellulosic materials when same numberof CMCase units is used in the assays.

For example, the exemplary enzymes EGs SEQ ID NO:318 (encoded by, e.g.,SEQ ID NO:317), SEQ ID NO:308 (encoded by, e.g., SEQ ID NO:307) and SEQID NO:314 (encoded by, e.g., SEQ ID NO:313), and Trichoderma EG wereassayed for AVICEL® hydrolysis. In these reactions, 10 units of CMCaseof each enzyme were used to digest 1% AVICEL® at 60° C. (FIG. 29A) and80° C. (FIG. 29B). Aliquots were taken to measure the products releasedat various time points; the data is illustrated as FIG. 29. At 60° C.(FIG. 29A), exemplary EGs of the invention outperformed the benchmark EGafter approximately 5 h of incubation. Two exemplary EGs of theinvention, SEQ ID NO:318 (encoded by, e.g., SEQ ID NO:317) and SEQ IDNO:314 (encoded by, e.g., SEQ ID NO:313), maintained Avicelase activityat 80° C. (FIG. 29B), while the benchmark EG was not active at thistemperature. FIG. 29 illustrates the hydrolysis of AVICEL® by exemplaryEGs under normalized conditions. Reducing sugar concentrations weremeasured at various time points for Trichoderma EG (open square), theexemplary EGs SEQ ID NO:318 (encoded by, e.g., SEQ ID NO:317) (soliddiamond), SEQ ID NO:308 (encoded by, e.g., SEQ ID NO:307) (solidsquare), and SEQ ID NO:314 (encoded by, e.g., SEQ ID NO:313) (triangle).

The studies described in this study demonstrate the effectiveness of theexemplary thermostable EG enzymes of the invention in degradation ofcellulosic materials (e.g., corn stover) at elevated temperatures.Thermostable EGs enzymes of the invention, e.g., those identified usingthe above-described methods, can be used to maximize a synergisticeffect among them. In another aspect, enzymes of the invention are usedin combination with other thermostable enzymes of the invention,including cellobiohydrolases and β-glucosidases. In another aspect,thermostable enzymes of the invention are used in combination with other(e.g., known) cellulases or related enzymes (e.g., thermostableenzymes), including cellobiohydrolases and β-glucosidases, which alsocan be thermostable, in various industrial processes, e.g., forprocessing biomass and/or hydrolyzing compositions comprising cellulosesand/or lignocellulosic materials, e.g., from plants. The invention alsoprovides processes utilizing thermostable enzymes of the invention atelevated temperatures, e.g., for use in biomass conversion, for examplefor the production of bioethanol.

In one screening protocol, approximately 170 endoglucanase genes fromenvironmental sources were subcloned and expressed, and thencharacterized on the soluble cellulose analog, carboxymethyl cellulose(CMC), and on AVICEL® MCC. Performance at pH 5, pH 7 and pH 9, and 37°C., 60° C. and 80° C. was assayed to define pH and temperature optimafor each endoglucanase. Activity on AVICEL® MCC was assessed bymeasuring the release of soluble sugars after a 24 hr incubation withenzyme. Ninety-five of the endoglucanases digested MCC to some extent,as illustrated in FIG. 30. Of these, 21 were optimally active at 60° C.and 14 were optimally active at 80° C. FIG. 30 graphically illustratesdata showing the pH and temperature optima of exemplary enzymes onAVICEL® MCC (1%). Soluble reducing sugars were measured with thebicinchoninic acid (BCA) colorimetric assay.

The highest level of digestion observed was approximately 30% to 40%,with extent of digestion dependent on the conditions of the assay, e.g.,time, temperature and substrate concentration. The reaction stalled atthis point, without going to completion. A combination of experimentssuggested that the stalling was due to limited access to hydrolysablesites in the substrate, rather than to enzyme instability or productinhibition. This was supported by experiments with phosphoricacid-treated AVICEL® MCC; this substrate was 100% hydrolysable by thetested endoglucanases. Phosphoric acid treatment swells and reduces thecrystallinity of cellulose, making it more accessible to enzymatichydrolysis.

Complete digestion of AVICEL® MCC or pretreated corn stover can beaccomplished using an enzyme cocktail of the invention, e.g., seediscussion above. Enzyme mixtures of the invention that are effectivefor the breakdown (hydrolysis) of lignocellulosic material, e.g., onpretreated corn stover, include combinations of cellulaseclasses—including endoglucanases, β-glucosidases and cellobiohydrolases.

After identifying and characterizing thermostable endoglucanases of theinvention having activity (hydrolysis activity) on AVICEL®microcrystalline cellulose (MCC) and are capable of hydrolysis ofcellulose and pretreated material, enzymes of the invention were furthercharacterized for their ability to hydrolyze cellulose-comprisingmaterials, e.g., a plant material, such as pretreated corn stover (PCS).As noted above, AVICEL®, which is 50-60% crystalline, was chosen as themodel substrate with the expectation that performance on AVICEL® MCCwould be predictive of performance on pretreated corn stover (PCS);noting this performance may not be predictive for all types ofpretreated samples.

As noted above, approximately 170 endoglucanase genes from environmentalsources and 95 enzymes were shown to digest MCC to some extent. Ofthese, 21 were optimally active at 60° C. and 14 were optimally activeat 80° C. Three different PCS samples were used: steam PCS, dilute acidPCS and severe alkaline PCS. These differ in chemical (composition) andphysical properties. All endoglucanases were tested for the ability torelease soluble sugar from these three PCS samples using an automated,medium throughput screen. The assays were performed at pH 5 or 7, 37°C., 50° C. or 80° C. with 1% of substrate as glucose and 1 mg/ml totalprotein (crude extracts). Product was analyzed by a reducing sugarassay. Certain reactions were scaled up and product was analyzed byHPLC.

The Table below lists the exemplary enzymes tested along with the amountof reducing sugar produced from AVICEL® MCC and whether any reducingsugar was observed from the reaction with the three PCS samples (Y=yes).Included in the table is the amount of conversion for severe alkalinePCS in 24 hrs. No entry indicates that there was no product formation.

There does not seem to be a correlation between performance on AVICEL®MCC and performance on alkaline PCS. For example, the exemplary enzymeSEQ ID NO:106 (encoded by, e.g., SEQ ID NO:105) performed the best onalk PCS but produced about half the amount of product on AVICEL® MCC ascompared to the exemplary enzyme SEQ ID NO:434 (encoded by, e.g., SEQ IDNO:433). Furthermore, several clones were active on alkPCS but inactiveon AVICEL® MCC (see the exemplary enzymes SEQ ID NO:202 (encoded by,e.g., SEQ ID NO:201), 10848 and 13626). Twenty-one enzymes have activityon alkPCS. Conversion can be improved by the addition of other enzymaticactivities, e.g., the addition of other enzymes, which can be anotherenzyme of the invention, or an unrelated enzyme, e.g., any enzyme havingcellobiohydrolase, β-glucosidase and “hemicellulase” activity.

[sugar] from Severe AVICEL ® in Steam acid alkPCS (% Enzyme Name 24 hrreaction PCS PCS conversion) SEQ ID NO: 434 (encoded, e.g., by SEQ IDNO: 433) 1.79 SEQ ID NO: 156 (encoded, e.g., by SEQ ID NO: 155) 1.67 Y(1.5) SEQ ID NO: 308 (encoded, e.g., by SEQ ID NO: 307) 1.59 Y Y (2.3)SEQ ID NO: 318 (encoded, e.g., by SEQ ID NO: 317) 1.51 Y Y Y (3.1) SEQID NO: 372 (encoded, e.g., by SEQ ID NO: 371) 1.18 SEQ ID NO: 314(encoded, e.g., by SEQ ID NO: 313) 1.02 Y (1.4) SEQ ID NO: 302 (encoded,e.g., by SEQ ID NO: 301) 0.94 Y (1.9) SEQ ID NO: 106 (encoded, e.g., bySEQ ID NO: 105) 0.84 Y Y Y (8.8) SEQ ID NO: 120 (encoded, e.g., by SEQID NO: 119) 0.83 Y SEQ ID NO: 126 (encoded, e.g., by SEQ ID NO: 125)0.73 Y SEQ ID NO: 110 (encoded, e.g., by SEQ ID NO: 109) 0.67 SEQ ID NO:146 (encoded, e.g., by SEQ ID NO: 145) 0.66 Y Y (2.3) SEQ ID NO: 354(encoded, e.g., by SEQ ID NO: 353) 0.62 SEQ ID NO: 160 (encoded, e.g.,by SEQ ID NO: 159) 0.59 Y (5.5) SEQ ID NO: 176 (encoded, e.g., by SEQ IDNO: 175) 0.56 SEQ ID NO: 236 (encoded, e.g., by SEQ ID NO: 235) 0.56 SEQID NO: 246 (encoded, e.g., by SEQ ID NO: 245) 0.52 SEQ ID NO: 216(encoded, e.g., by SEQ ID NO: 215) 0.51 SEQ ID NO: 296 (encoded, e.g.,by SEQ ID NO: 295) 0.51 SEQ ID NO: 256 (encoded, e.g., by SEQ ID NO:255) 0.49 Y (4.2) SEQ ID NO: 186 (encoded, e.g., by SEQ ID NO: 185) 0.48Y (5) SEQ ID NO: 124 (encoded, e.g., by SEQ ID NO: 123) 0.48 SEQ ID NO:162 (encoded, e.g., by SEQ ID NO: 161) 0.48 SEQ ID NO: 270 (encoded,e.g., by SEQ ID NO: 269) 0.46 SEQ ID NO: 276 (encoded, e.g., by SEQ IDNO: 275) 0.45 SEQ ID NO: 190 (encoded, e.g., by SEQ ID NO: 189) 0.45 SEQID NO: 274 (encoded, e.g., by SEQ ID NO: 273) 0.45 SEQ ID NO: 214(encoded, e.g., by SEQ ID NO: 213) 0.44 SEQ ID NO: 290 (encoded, e.g.,by SEQ ID NO: 289) 0.44 SEQ ID NO: 306 (encoded, e.g., by SEQ ID NO:305) 0.42 SEQ ID NO: 118 (encoded, e.g., by SEQ ID NO: 117) 0.42 SEQ IDNO: 30 (encoded, e.g., by SEQ ID NO: 29) 0.42 SEQ ID NO: 144 (encoded,e.g., by SEQ ID NO: 143) 0.41 SEQ ID NO: 134 (encoded, e.g., by SEQ IDNO: 133) 0.4 SEQ ID NO: 194 (encoded, e.g., by SEQ ID NO: 193) 0.39 SEQID NO: 210 (encoded, e.g., by SEQ ID NO: 209) 0.39 SEQ ID NO: 240(ENCODED BY SEQ ID NO: 239) 0.38 Y Y (2) SEQ ID NO: 278 (ENCODED BY SEQID NO: 277) 0.37 SEQ ID NO: 294 (ENCODED BY SEQ ID NO: 293) 0.37 SEQ IDNO: 170 (ENCODED BY SEQ ID NO: 169) 0.37 SEQ ID NO: 208 (ENCODED BY SEQID NO: 207) 0.37 SEQ ID NO: 128 (ENCODED BY SEQ ID NO: 127) 0.36 SEQ IDNO: 132 (ENCODED BY SEQ ID NO: 131) 0.36 SEQ ID NO: 158 (ENCODED BY SEQID NO: 157) 0.36 SEQ ID NO: 178 (ENCODED BY SEQ ID NO: 177) 0.36 SEQ IDNO: 166 (ENCODED BY SEQ ID NO: 165) 0.34 SEQ ID NO: 196 (ENCODED BY SEQID NO: 195) 0.34 Y Y (8.2) SEQ ID NO: 204 (ENCODED BY SEQ ID NO: 203)0.34 SEQ ID NO: 218 (ENCODED BY SEQ ID NO: 217) 0.33 Y SEQ ID NO: 242(ENCODED BY SEQ ID NO: 243) 0.33 SEQ ID NO: 154 (ENCODED BY SEQ ID NO:153) 0.29 Y (5.6) SEQ ID NO: 300 (ENCODED BY SEQ ID NO: 299) 0.28 SEQ IDNO: 338 (ENCODED BY SEQ ID NO: 337) 0.27 SEQ ID NO: 284 (ENCODED BY SEQID NO: 283) 0.27 SEQ ID NO: 112 (ENCODED BY SEQ ID NO: 111) 0.27 SEQ IDNO: 224 (ENCODED BY SEQ ID NO: 223) 0.27 SEQ ID NO: 136 (ENCODED BY SEQID NO: 135) 0.26 SEQ ID NO: 430 (ENCODED BY SEQ ID NO: 429) 0.26 SEQ IDNO: 198 (ENCODED BY SEQ ID NO: 197) 0.25 Y Y SEQ ID NO: 428 (ENCODED BYSEQ ID NO: 427) 0.25 Y Y (2.6) SEQ ID NO: 282 (ENCODED BY SEQ ID NO:281) 0.25 SEQ ID NO: 268 (ENCODED BY SEQ ID NO: 267) 0.25 SEQ ID NO: 152(ENCODED BY SEQ ID NO: 151) 0.24 Y Y (5.7) SEQ ID NO: 38 (ENCODED BY SEQID NO: 37) 0.23 SEQ ID NO: 292 (ENCODED BY SEQ ID NO: 291) 0.23 SEQ IDNO: 232 (ENCODED BY SEQ ID NO: 231) 0.23 SEQ ID NO: 234 (ENCODED BY SEQID NO: 233) 0.22 SEQ ID NO: 122 (ENCODED BY SEQ ID NO: 121) 0.22 SEQ IDNO: 142 (ENCODED BY SEQ ID NO: 141) 0.22 SEQ ID NO: 244 (ENCODED BY SEQID NO: 243) 0.21 SEQ ID NO: 138 (ENCODED BY SEQ ID NO: 137) 0.21 SEQ IDNO: 200 (ENCODED BY SEQ ID NO: 199) 0.21 SEQ ID NO: 116 (ENCODED BY SEQID NO: 115) 0.19 Y Y (4.2) SEQ ID NO: 114 (ENCODED BY SEQ ID NO: 113)0.19 SEQ ID NO: 248 (ENCODED BY SEQ ID NO: 247) 0.19 SEQ ID NO: 360(ENCODED BY SEQ ID NO: 359) 0.18 SEQ ID NO: 184 (ENCODED BY SEQ ID NO:183) 0.18 SEQ ID NO: 192 (encoded by SEQ ID NO: 191) 0.18 SEQ ID NO: 222(ENCODED BY SEQ ID NO: 221) 0.17 SEQ ID NO: 140 (ENCODED BY SEQ ID NO:139) 0.16 SEQ ID NO: 168 (ENCODED BY SEQ ID NO: 167) 0.14 Y SEQ ID NO:182 (ENCODED BY SEQ ID NO: 181) 0.13 SEQ ID NO: 220 (ENCODED BY SEQ IDNO: 219) 0.13 SEQ ID NO: 260 (ENCODED BY SEQ ID NO: 259) 0.13 SEQ ID NO:262 (ENCODED BY SEQ ID NO: 261) 0.12 Y Y (0.4) SEQ ID NO: 280 (ENCODEDBY SEQ ID NO: 279) 0.12 SEQ ID NO: 258 (ENCODED BY SEQ ID NO: 257) 0.12SEQ ID NO: 108 (ENCODED BY SEQ ID NO: 107) 0.1 SEQ ID NO: 206 (ENCODEDBY SEQ ID NO: 205) 0.1 SEQ ID NO: 130 (ENCODED BY SEQ ID NO: 129) 0.08SEQ ID NO: 138 (ENCODED BY SEQ ID NO: 137) 0.08 SEQ ID NO: 286 (ENCODEDBY SEQ ID NO: 285) 0.08 SEQ ID NO: 316 (ENCODED BY SEQ ID NO: 315) 0.07Y (3) SEQ ID NO: 296 (ENCODED BY SEQ ID NO: 295) 0.07 SEQ ID NO: 288(ENCODED BY SEQ ID NO: 287) 0.07 SEQ ID NO: 202 (ENCODED BY SEQ ID NO:201) 0 Y Y Y (4.9) SEQ ID NO: 174 (ENCODED BY SEQ ID NO: 173) 0 Y SEQ IDNO: 238 (ENCODED BY SEQ ID NO: 237) 0 Y (1.3) SEQ ID NO: 416 (ENCODED BYSEQ ID NO: 415) 0 Y (1.7)

Example 9 Identification and Characterization of Cellobiohydrolases andβ-Glucosidases

This example describes the identification and characterization ofcellobiohydrolases (CBHs) and β-glucosidases of the invention. In oneaspect, CBHs and β-glucosidases of the invention are used to complementthe hydrolysis activity of endoglucanases used to practice the methodsof the invention, e.g., the endoglucanases of the invention, asdiscussed above. In summary, 89 active β-glucosidases of the inventionand 28 active cellobiohydrolases of the invention were characterized.

Discovery was a combination of activity-based screens using modelsubstrates (dye labeled sugars) and sequence-based discovery usingprobes designed from conserved sequences of known family 6 and 7cellobiohydrolases. β-glucosidase discovery utilized a large number ofbacterial gene libraries, mainly focusing on libraries generated fromhigh temperature environments. New β-glucosidase enzymes of theinvention so identified were subcloned into appropriate expressionvectors and characterized for activity on dye-labeled substrates, aswell as cellobiose and cellohexaose. Both pH and temperature optima weredetermined for each enzyme. In total 93 genes were analyzed foractivity. Of the 93 subclones, 89 were shown to be active on the dyelabeled substrate, pNP-β-glucopyranoside, see FIG. 31 and the Table,below. FIG. 31 graphically illustrates data showing the pH andtemperature optima of 89 β-glucosidases of the invention. Exemplaryβ-glucosidases of the invention with pH_(opt), T_(opt) and specificactivity on pNP-beta-glucopyranoside (pNP-β-glucopyranoside) are:

Enzyme pHopt Topt SA (U/mg) SEQ ID NO: 254 (encoded by SEQ ID NO: 253) 560 2.25 SEQ ID NO: 264 (encoded by SEQ ID NO: 263) 5 80 43.93 SEQ ID NO:340 (encoded by SEQ ID NO: 339) 7 60 10.86 SEQ ID NO: 364 (encoded bySEQ ID NO: 363) 5 60 3.83 SEQ ID NO: 356 (encoded by SEQ ID NO: 355) 660 0.89 SEQ ID NO: 326 (encoded by SEQ ID NO: 325) 5 37 14.94 SEQ ID NO:358 (encoded by SEQ ID NO: 357) 7 60 4.06 SEQ ID NO: 320 (encoded by SEQID NO: 319) 6 60 2.75 SEQ ID NO: 346 (encoded by SEQ ID NO: 345) 6 370.43 SEQ ID NO: 348 (encoded by SEQ ID NO: 347) 6 60 0.264 SEQ ID NO:362 (encoded by SEQ ID NO: 361) 6 80 3 SEQ ID NO: 342 (encoded by SEQ IDNO: 341) 6 60 3.5 SEQ ID NO: 336 (encoded by SEQ ID NO: 335) 7 370.00728 SEQ ID NO: 352 (encoded by SEQ ID NO: 351) 6 80 13.5 SEQ ID NO:304 (encoded by SEQ ID NO: 303) 5 60 0.5 SEQ ID NO: 322 (encoded by SEQID NO: 321) 6 37 8.02 SEQ ID NO: 432 (encoded by SEQ ID NO: 431) 6 600.7 SEQ ID NO: 226 (encoded by SEQ ID NO: 225) 6 37 0.185 SEQ ID NO: 228(encoded by SEQ ID NO: 227) 5 60 0.31 SEQ ID NO: 312 (encoded by SEQ IDNO: 311) 7 37 0.38 SEQ ID NO: 370 (encoded by SEQ ID NO: 369) 5 37 0.21SEQ ID NO: 404 (encoded by SEQ ID NO: 403) 5 37 0.229 SEQ ID NO: 420(encoded by SEQ ID NO: 419) 6 37 2.883 SEQ ID NO: 400 (encoded by SEQ IDNO: 399) 6 37 2.369 SEQ ID NO: 384 (encoded by SEQ ID NO: 383) 7 37 0.88SEQ ID NO: 24 (encoded by SEQ ID NO: 23) 8 37 2.743 SEQ ID NO: 42(encoded by SEQ ID NO: 41) 5 37 1.57 SEQ ID NO: 408 (encoded by SEQ IDNO: 407) 6 37 23.083 SEQ ID NO: 382 (encoded by SEQ ID NO: 381) 6 601.82 SEQ ID NO: 228 (encoded by SEQ ID NO: 227) 6 37 6.77 SEQ ID NO: 344(encoded by SEQ ID NO: 343) 5 37 0.0339 SEQ ID NO: 332 (encoded by SEQID NO: 331) 5 37 0.492 SEQ ID NO: 150 (encoded by SEQ ID NO: 149) 6 804.26 SEQ ID NO: 230 (encoded by SEQ ID NO: 229) 6 37 2.699 SEQ ID NO:310 (encoded by SEQ ID NO: 309) 7 37 0.963 SEQ ID NO: 94 (encoded by SEQID NO: 93) 6 60 164.026 SEQ ID NO: 6 (encoded by SEQ ID NO: 5) 5 370.263 SEQ ID NO: 298 (encoded by SEQ ID NO: 297) 5 37 0.172 SEQ ID NO:376 (encoded by SEQ ID NO: 375) 5 37 0.489 SEQ ID NO: 148 (encoded bySEQ ID NO: 147) 5 37 0.24 SEQ ID NO: 386 (encoded by SEQ ID NO: 385) 537 0.25 SEQ ID NO: 350 (encoded by SEQ ID NO: 349) 5 37 0.172 SEQ ID NO:18 (encoded by SEQ ID NO: 17) 5 37 0.346 SEQ ID NO: 50 (encoded by SEQID NO: 49) 5 37 0.619 SEQ ID NO: 424 (encoded by SEQ ID NO: 423) 6 3710.263 SEQ ID NO: 422 (encoded by SEQ ID NO: 421) 5 37 0.178 SEQ ID NO:8 (encoded by SEQ ID NO: 7) 5 37 0.0879 SEQ ID NO: 212 (encoded by SEQID NO: 211) 8 37 0.228 SEQ ID NO: 366 (encoded by SEQ ID NO: 365) 8 800.052 SEQ ID NO: 380 (encoded by SEQ ID NO: 379) 5 37 0.336 SEQ ID NO:58 (encoded by SEQ ID NO: 57) 5 37 0.0455 SEQ ID NO: 58 (encoded by SEQID NO: 57) 5 37 0.0181 SEQ ID NO: 388 (encoded by SEQ ID NO: 387) 6 60168 SEQ ID NO: 4 (encoded by SEQ ID NO: 3) 6 37 0.506 SEQ ID NO: 76(encoded by SEQ ID NO: 75) 5 60 0.73 SEQ ID NO: 90 (encoded by SEQ IDNO: 89) 5 60 12.6 SEQ ID NO: 328 (encoded by SEQ ID NO: 327) 6 60 0.16SEQ ID NO: 334 (encoded by SEQ ID NO: 333) 5 60 3.09 SEQ ID NO: 16(encoded by SEQ ID NO: 15) 6 60 1.08 SEQ ID NO: 30 (encoded by SEQ IDNO: 29) 8 37 36.6 SEQ ID NO: 374 (encoded by SEQ ID NO: 373) 5 37 0.027SEQ ID NO: 394 (encoded by SEQ ID NO: 393) 6 60 1.91 SEQ ID NO: 330(encoded by SEQ ID NO: 329) 7 37 12.3 SEQ ID NO: 164 (encoded by SEQ IDNO: 163) 8 60 0.35 SEQ ID NO: 378 (encoded by SEQ ID NO: 377) 5 37 0.033SEQ ID NO: 410 (encoded by SEQ ID NO: 409) 5 37 0.29 SEQ ID NO: 418(encoded by SEQ ID NO: 417) 5 37 0.02 SEQ ID NO: 70 (encoded by SEQ IDNO: 69) 6 37 0.77 SEQ ID NO: 412 (encoded by SEQ ID NO: 411) 5 37 0.12SEQ ID NO: 398 (encoded by SEQ ID NO: 397) 6 60 2.26 SEQ ID NO: 272(encoded by SEQ ID NO: 271) 6 37 1.49 SEQ ID NO: 324 (encoded by SEQ IDNO: 323) 7 37 2.31 SEQ ID NO: 172 (encoded by SEQ ID NO: 171) 5 60 1.97SEQ ID NO: 188 (encoded by SEQ ID NO: 187) 6 80 7.06 SEQ ID NO: 250(encoded by SEQ ID NO: 249) 6 80 15.35 SEQ ID NO: 252 (encoded by SEQ IDNO: 251) 6 80 11.21 SEQ ID NO: 180 (encoded by SEQ ID NO: 179) 5 37 0.03SEQ ID NO: 368 (encoded by SEQ ID NO: 367) 5 37 0.1 SEQ ID NO: 266(encoded by SEQ ID NO: 265) 7 37 0.04 SEQ ID NO: 414 (encoded by SEQ IDNO: 413) 5 37 0.071 SEQ ID NO: 390 (encoded by SEQ ID NO: 389) 5 37 0.01SEQ ID NO: 402 (encoded by SEQ ID NO: 401) 6 37 10.6 SEQ ID NO: 426(encoded by SEQ ID NO: 425) 7 37 25.7 SEQ ID NO: 392 (encoded by SEQ IDNO: 391) 6 80 44 SEQ ID NO: 396 (encoded by SEQ ID NO: 395) 6 37 5.7 SEQID NO: 406 (encoded by SEQ ID NO: 405) 5 37 0.17 SEQ ID NO: 438 (encodedby SEQ ID NO: 437) 5 37 0.2 SEQ ID NO: 436 (encoded by SEQ ID NO: 435) 637 0.004 SEQ ID NO: 492 (encoded by SEQ ID NO: 491) 6 37 2.5

The activity of the exemplary SEQ ID NO:264 (ENCODED BY SEQ ID NO:263),SEQ ID NO:94 (encoded by SEQ ID NO:93) and SEQ ID NO:388 (encoded by SEQID NO:387) enzymes was tested on cellobiose and cellohexaose. Theexemplary SEQ ID NO:94 (ENCODED BY SEQ ID NO:93) and SEQ ID NO:388(encoded by SEQ ID NO:387) both were significantly more active oncellohexaose than on cellobiose while SEQ ID NO:264 (ENCODED BY SEQ IDNO:263) had almost equivalent activity on these two substrates. TheK_(m) for cellobiose of SEQ ID NO:264 (ENCODED BY SEQ ID NO:263) wasdetermined to be approximately 2.5 mM; consistent with literature valuesfor other similar enzymes. Based on these results the exemplary SEQ IDNO:264 (ENCODED BY SEQ ID NO:263) was chosen as the top candidate to beused in the enzyme cocktails.

The protocols and design for discovery and characterization of CBH genesand enzymes were similar to those for the discovery and characterizationof β-glucosidases of the invention, as discussed herein. Consideringthat all previously reported Family 6 and 7 cellobiohydrolases have beenfound in the fungi, we screened 150 strains from fungal libraries forthe ability to degrade biomass, e.g., using pretreated corn stover.Nucleic acids (DNA and mRNA) were isolated from the positive strains andboth genomic DNA and cDNA were probed for the presence of family 6 and 7genes. Full length genes were recovered (with and without introns) andcloned into eukaryotic expression systems (Pichia, baculovirus andCochliobolus heterostrophus). In total 41 full length genes wereidentified and analyzed. All 41 were cloned into Pichia andCochliobolus; two were shown to be active in the Pichia constructs while28 were active in the Cochliobolus constructs, as shown by theillustrated enzyme phylogenetic trees in FIG. 32. Activity was measuredon AVICEL® microcrystalline cellulose and phosphoric acid swollencellulose (PASC). FIG. 32 illustrates phylogenetic trees of discoveredCBH genes of the invention; the blue arrows highlight T. reeseisequences. Genes that expressed and produced active protein in Pichiaare indicated by the light blue symbols; those active in baculovirus areindicated by the yellow symbol and those active in Cochliobolus areindicated by the red symbols.

Example 10 Optimizing Content of Enzyme Cocktails of the Invention

This example describes the exemplary methods to accurately determineactive enzyme in crude protein mixtures in order to optimize thecomposition of the enzyme cocktails of the invention and reduce overallprotein content.

A combination of protein purification, SDS-PAGE analysis and enzymeassays allowed a semi-quantitative measure of the amount of activeenzyme in each of the crude preparations. A systematic approach wastaken to remove redundant and unnecessary enzymes from an exemplaryenzyme mixture of the invention, the so-called “E10 cocktail”. It wasdetermined that two of the enzymes, SEQ ID NO:442 (ENCODED BY SEQ IDNO:441) (α-glucuronidase) and SEQ ID NO:440 (ENCODED BY SEQ ID NO:439)(ferulic acid esterase) contributed very little to overall performanceand were removed from the cocktail resulting in an “E8” mixture.Experiments were carried out to determine which of thecellobiohydrolases (CBH I, CBH II, SEQ ID NO:98 (ENCODED BY SEQ IDNO:97) and SEQ ID NO:34 (ENCODED BY SEQ ID NO:33)) were the mosteffective in the cocktails. The performance from three different mixeswas assessed (Case 1, 2 and 3).

The composition of each of these mixes is shown in the tablesillustrated in FIG. 57 (Case 1—CBH I/CBH II), FIG. 58 (Case 2—CBH I/SEQID NO:98 (ENCODED BY SEQ ID NO:97)), and FIG. 59 (Case 3—SEQ ID NO:34(ENCODED BY SEQ ID NO:33)/SEQ ID NO:98 (ENCODED BY SEQ ID NO:97)). Thetables show how much of each enzyme was used in the cocktails andestimates of active enzyme in each of these preparations. In addition,the tables show data calculated as mg enzyme/g cellulose for eachreaction. In all three cases the total enzyme composition was tabulatedand was below the 20 mg/g cellulose limited outlined in the target (case1=17.2 mg/g; case 2=18.1 mg/g and case 3=16 mg/g).

FIGS. 60 and 61 show the time courses of saccharification of Jaygo 2 (5%solids pretreated corn cob) using these three enzyme mixes (Case 1, 2and 3, FIGS. 57 to 59, respectively). While there were some minordifferences in rates between the cases all three resulted in almostexactly 80% recovery of glucose and 62% recovery of xylose within 48hrs. FIG. 60 data shows glucose release from Jaygo 2 (5% solids)catalyzed by three different exemplary E8 cocktails: CBH I/CBH II isCase 1 table; CBH I/SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) is Case 2table and SEQ ID NO:34 (ENCODED BY SEQ ID NO:33)/SEQ ID NO:98 (ENCODEDBY SEQ ID NO:97) is the Case 3 table. Glucose concentration wasdetermined by HPLC analysis of the saccharified liquors sampled at 4,20, 30 and 48 hrs. Percent conversion was calculated by using 120 mM as100% available glucose in the pretreated solids. Reaction conditions arepH 5.5 and 50° C.

FIG. 61 data shows xylose release from Jaygo 2 (5% solids pretreatedcorn cob) catalyzed by three different exemplary E8 cocktails: CBH I/CBHII is Case 1 table; CBH I/SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) is Case2 table and SEQ ID NO:34 (ENCODED BY SEQ ID NO:33)/SEQ ID NO:98 (ENCODEDBY SEQ ID NO:97) is the Case 3 table. Xylose concentration wasdetermined by HPLC analysis of the saccharified liquors sampled at 4,20, 30 and 48 hrs. Percent conversion was calculated by using 113 mM as100% available xylose in the pretreated solids. Reaction conditions arepH 5.5 and 50° C.

In summary:

Benchmark Performance SPEZYME ® Parameters enzyme* Case 1 Case 2 Case 3mg active 20 17.2 18.1 16.0 enzyme/g cellulose % Conversion 80 76 79 76to glucose % Conversion 65 61 63 62 to xylose Time for 48 48 48 48conversion (hr) % Solids 2.5 5 5 5 *Performance of SPEZYME ® enzyme (15FPU) on corn stover receiving the ‘severe’ alkaline pretreatment (15%NH₄OH, 170° C., 5 minute residence time) followed by disc-refining(0.010″ gap).

Example 11 Optimizing Content of Enzyme Cocktails of the Invention

This example describes the development and integration of aphysical/chemical pretreatment of corn stover/fiber with enzymatichydrolysis of complex polysaccharides to fermentable sugars usingpolypeptides of the invention. The invention provides methods fordeveloping and evaluating enzymes for hydrolysis and saccharification ofpretreated biomass, and enzymes for the hydrolysis and saccharificationof pretreated biomass. In one aspect, in practicing the compositions andmethods of the invention, overall capital costs and expenses of monomersugar production can be reduced. The compositions and methods of theinvention also provide feedstock for downstream production of fuels andchemicals.

The invention provides enzymes and enzyme mixes (“cocktails”) havingendoglucanase, cellobiohydrolase and/or β-glucosidase activity forbiomass conversion, e.g., for the saccharification of cellulose inpretreated stover. Enzymes of the invention have been demonstrated tohave activity in the processing of actual biomass feedstocks, includingcellulase- and hemicellulase-comprising compositions, as set forthherein. Selected hemicellulases will be included as resources permit.The invention also uses various analytical and robotic methods tocharacterize the enzymes of the invention, e.g., to characterizeindividual and combinations of cellulase and hemicellulase enzymes ofthe invention on model substrates and pretreated corn stover samples.

Enzymes of the invention include members of glycosyl hydrolase Families5, 6, 8, 9, and 12 (for discussion of glycosyl hydrolase families seee.g., the CAZy(ModO) website database, as discussed by Coutinho, et al.,(1999) Carbohydrate-active enzymes: an integrated database approach. In“Recent Advances in Carbohydrate Bioengineering”, H. J. Gilbert, et al.,eds., The Royal Society of Chemistry, Cambridge, pp. 3-12; and,Coutinho, et al., (1999) The modular structure of cellulases and othercarbohydrate-active enzymes: an integrated database approach. In“Genetics, Biochemistry and Ecology of Cellulose Degradation”, K.Ohmiya, et al., eds., Uni Publishers Co., Tokyo, pp. 15-23). Theirdiscovery process comprised activity screening of 170 endoglucanasegenes from environmental sources, including genes from glycosylhydrolase Families 5, 6, 8, 9, and 12. These genes were subcloned andexpressed, and then characterized on the soluble cellulose analog,carboxymethyl cellulose (CMC), and on microcrystalline cellulose(AVICEL® MCC). AVICEL® MCC, which is 50-60% crystalline, was chosen asthe model substrate on the premise that performance on AVICEL® MCC wouldbe predictive of performance on pretreated corn stover (PCS).

Performance at pH 5, 7 and 9, and 37° C., 60° C. and 80° C. was assayedto define pH and temperature optima for each endoglucanase. Although thefinal process may not demand thermostable enzymes, a high temperaturesaccharification may have improved rates of hydrolysis, thereforedecreasing enzyme loading and/or residence time requirements. Activityon AVICEL® MCC was assessed by measuring the release of soluble sugarsafter a 24 hr incubation with enzyme. Ninety-five of the endoglucanasesdigested AVICEL® MCC to some extent, as illustrated in FIG. 64. Ofthese, 21 were optimally active at 60° C. and 14 were optimally activeat 80° C. FIG. 64 data summarizes studies of pH and temperature optimaof 95 enzymes on microcrystalline cellulose AVICEL® MCC (1%). Solublereducing sugars were measured by a colorimetric assay (BCA,bicinchoninic acid).

The highest level of enzyme digestion (hydrolysis) observed wasapproximately 30% to 40%, with extent of digestion dependent on theconditions of the assay, including time, temperature and substrateconcentration. The reaction stalled at this point, without going tocompletion. A combination of experiments suggested that the stalling wasdue to limited access to hydrolysable sites in the substrate, ratherthan to enzyme instability or product inhibition. This was supported byexperiments with phosphoric acid-treated AVICEL® MCC (phosphoric acidswollen cellulose, PASC). Phosphoric acid treatment swells and reducesthe crystallinity of cellulose, making it more accessible to enzymatichydrolysis. PASC was 100% hydrolysable by the tested endoglucanases, asillustrated in FIG. 65. FIG. 65 graphically illustrates data showing thereaction time courses of the exemplary enzymes SEQ ID NO:318 (ENCODED BYSEQ ID NO:317) and SEQ ID NO:308 (ENCODED BY SEQ ID NO:307) with 1%(w/v) AVICEL® MCC and phosphoric acid swollen cellulose, PASC. The SEQID NO:318 (ENCODED BY SEQ ID NO:317) reaction was run at 80° C. whileSEQ ID NO:308 (ENCODED BY SEQ ID NO:307) was run at 60° C. (both at pH5). Both reaction mixtures contained 1 mg/ml total protein (<5%endoglucanase enzyme). Reaction products (cellobiose and glucose) weredetermined by HPLC and converted into % conversion based on initialsubstrate concentration. Glucose equivalent was calculated from G1 andG2×2. Complete conversion is equivalent to approximately 56 mM glucose.

Following the observation that many of the screened endoglucanases (EGs)were active on AVICEL® MCC, the next step was to demonstrate activity onpretreated corn stover samples. Three different pretreated corn stover(PCS) samples were tested: steam PCS, dilute acid PCS and high severityalkaline pretreated corn stover (alkPCS). Not only do these differ inpretreatment method, they also differ in chemical (composition) andphysical properties. All endoglucanases were tested for the ability torelease soluble sugars from these three PCS samples using an automated,medium throughput screen (described below). The assays were performed atpH 5, 7 and 9, 37° C., or 50° C. with 1% of solids and 0.25 mg/ml totalprotein (crude cell lysates). Soluble products were analyzed by areducing sugar assay (BCA). Active hits on alkaline pretreated cornstover (alkPCS) were further confirmed in scaled-up reactions andproducts were analyzed by HPLC to determine conversion.

Listed below are enzymes tested along with the amount of reducing sugarproduced from AVICEL® MCC and whether any reducing sugar was observedfrom the reaction with the three PCS samples (Y=yes). Included in thetable is the amount of conversion for high severity alkaline PCS in 120hrs. No entry indicates that there was no product formation. There doesnot seem to be a correlation between performance on AVICEL® MCC andperformance on alkPCS. For example, the exemplary SEQ ID NO:106 (ENCODEDBY SEQ ID NO:105) performed the best on alkPCS but produced about halfthe amount of product on AVICEL® MCC as compared to the exemplary SEQ IDNO:434 (ENCODED BY SEQ ID NO:433).

Furthermore, several clones were active on alkPCS but inactive onAVICEL® MCC (see SEQ ID NO:202 (ENCODED BY SEQ ID NO:201), 10848 and13626). Twenty-one enzymes have activity on alkPCS. These same enzymeswere tested on “medium” and “low” severity alkPCS. FIG. 66 compares theamount of glucose released from these three samples under identicalreaction conditions (1 mg/ml protein load, 2.2% solids, 5 ml). Completedigestion would result in approx. 56 mM glucose. FIG. 66 graphicallyillustrates data of studies showing glucose equivalent release fromhigh, medium and low severity alkPCS by various endoglucanases (EGs) ofthe invention after 48 hr. Reaction conditions were 37°, 50° or 80° C.and pH 5 or 7, depending on the optimum of the individual enzyme.

The exemplary SEQ ID NO:106 (ENCODED BY SEQ ID NO:105) performed thebest on the three alkaline pretreated corn stover samples. Time coursesrevealed that product release saturated between 12% to 14% conversion.This observation is very similar to activity on AVICEL® MCC (see FIG.64). Dose dependence of SEQ ID NO:106 (ENCODED BY SEQ ID NO:105), asillustrated in FIG. 67, showed that the rate of glucose releaseincreased with increased concentration of enzyme, however extent was notsignificantly affected. Limited conversion that is unaffected byincreased enzyme loading suggests that there are limited accessiblesites on cellulose in alkPCS and is not an indication of enzymeinstability. Subsequent tasks (mainly Task 2.5, enzyme cocktails) aredesigned to address this issue. FIG. 67 graphically illustrates data ofstudies showing the dose dependence of the exemplary SEQ ID NO:106(ENCODED BY SEQ ID NO:105) (crude E. coli lysate). Conditions were 50°C., pH 5, 2.2% solids (high severity alkPCS).

Development of Medium Throughput Screening Methods and Analytical Tools

Robotic Assays.

The invention provides automated methods to assess performance (e.g.,activity) of enzymes (e.g., to determine if a polypeptide is within thescope of the invention), including testing activity of the enzymes ofthe invention. These methods include dispensing solid substrates andenzymes into microtiter plates, incubation at several conditions andanalyzing for products at regular time intervals, as illustrated, forexample, in FIG. 68. FIG. 68 illustrates a schematic of an exemplaryautomated system of the invention developed to screen large numbers ofenzymes and substrates.

A variety of product detection methods were tested and it was determinedthat the exemplary assay of the invention comprising use of a “BCA”(bicinchoninic acid) reducing sugar assay was the most robust due to itssensitivity and broad substrate specificity, as illustrated in FIG. 69.Development of these methods allowed carrying out of thousands ofreactions per day. An exemplary assay using alkaline PCS and a series ofendoglucanases is shown in FIG. 70. FIG. 69 illustrates standard curvesof glucose on the automated system. Glucose standards were removed frommicrotiter plates at defined times, mixed with the BCA reagent andabsorbance measured. FIG. 70 illustrates testing of six enzymes of theinvention on high severity alkPCS at pH 5 and 50° C. The standard curvesshown in (A) were used to convert absorbance into sugar concentration.

High Pressure Liquid Chromatography and Capillary Electrophoresis:

Once exemplary enzymes were identified using the high throughputsystems, larger scale reactions (5 mL) were performed to validateperformance and accurately measure product concentrations. Certainanalytical tools had to be developed in order to discriminate betweenthe various reaction products. These tools included high pressure liquidchromatography (HPLC, RI or ELSD detection) and capillaryelectrophoresis (CE). Each has advantages and disadvantages, andcurrently HPLC-RI has been the workhorse method but the other two areused under special circumstances. An example of HPLC-RI separation anddetection of various sugars is shown in FIG. 71. FIG. 71 illustratesdata from an HPLC separation of sugar monomers following enzymaticdigestion of alkPCS. G2: cellobiose, G1: glucose, X1: xylose and Ara:arabinose.

Capillary electrophoresis (CE), a very fast and sensitive method, isused in methods of the invention to monitor product release, asillustrated in FIG. 72. Prior to capillary electrophoresis the reactionproducts were labeled with the fluorophore1-aminopyrene-3,6,8-trisulfonate (APTS). Labeling with this compoundresulted in sub-micromolar detection of sugars. There was baselineseparation between the cello oligosaccharides which allowed for reactionprofiling and quantitation of products. FIG. 72 summarizes the capillaryelectrophoresis separation of cello-oligosaccharides from cellobiose(G2) to cellohexaose (G6). Prior to electrophoresis the sugars werelabeled with APTS.

Cellobiohydrolase and β-Glucosidase Discovery

High throughput activity-based screening methods were developed forβ-glucosidase and cellobiohydrolase discovery. These screens utilizedmodel chromophoric substrates, specifically resorufin-β-glucopyranoside(A) for the β-glucosidases and either 4-methylumbelliferone-cellobioside(B) or 4-methyl-umbelliferone-lactoside (C) for the cellobiohydrolases.

Close to 100 β-glucosidase genes were isolated during the discoveryphase of the program. Bioinformatic analysis helped identify the mostlikely open reading frame for each gene. The genes were then subclonedinto overexpression vectors for more detailed biochemical analysis.Sequence analysis showed that these β-glucosidase enzymes of theinvention were almost equally distributed between the Family 1 and 3glycosyl hydrolases; enzymes belong to either Family 1 (“GH1”) or Family3 (“GH3”) of the glycosyl hydrolase superfamily.

Biochemical characterization of these exemplary β-glucosidases involveddetermination of pH and temperature optima on a model chromophoricsubstrate, pNP-β-glucopranoside (pNP-β-gluc). Crude, cell-free extractswere used in each assay and activity is reported as units/mg totalprotein. Activity measured in this manner is a reflection of bothspecific activity and gene expression. 60 out of 93 subclonedβ-glucosidases were assayed on pNP-β-gluc, as illustrated in FIG. 73.Fifty-one of the enzymes had measurable activity. FIG. 73 graphicallyillustrated data showing pH and temperature optima for 51 activeβ-glucosidases of the invention. Rates of hydrolysis ofpNP-β-glucopyranoside normalized to total protein in a crude E. colilysate.

Using this assay, three exemplary enzymes of the invention, SEQ IDNO:264 (ENCODED BY SEQ ID NO:263) (44 U/mg at pH 5 and 80° C.), SEQ IDNO:94 (ENCODED BY SEQ ID NO:93) (164 U/mg at pH 6 and 60° C.) and SEQ IDNO:388 (ENCODED BY SEQ ID NO:387) (160 U/mg at pH 6 and 60° C.), standout from the rest. These 3 were chosen for more detailed kineticanalysis on cellobiose and cellohexaose. Using this assay, the exemplarySEQ ID NO:94 (ENCODED BY SEQ ID NO:93) and SEQ ID NO:388 (encoded by SEQID NO:387) both were significantly more active on cellohexaose than oncellobiose, while SEQ ID NO:264 (ENCODED BY SEQ ID NO:263) had almostequivalent activity on these two substrates. Since in one aspectcellobiose is the substrate in a bioconversion process of the invention,the kinetics of the exemplary SEQ ID NO:264 (ENCODED BY SEQ ID NO:263)was analyzed on this substrate. FIG. 74 shows that the exemplary SEQ IDNO:264 (ENCODED BY SEQ ID NO:263) has a K_(M) of ˜2.5 mM cellobiosewithout any sign of substrate inhibition up to 20 mM cellobiose. FIG. 74graphically illustrates a Michaelis-Menten plot of activity of theexemplary SEQ ID NO:264 (ENCODED BY SEQ ID NO:263) with the substratecellobiose. The line represents a fit to the Michaelis-Menten equationwith a K_(M) of approximately 2.5 mM.

In addition to activity-based screening utilizing environmental DNAlibraries, sequence-based screening of fungal strains isolated from highthroughput culturing (HTC) was undertaken. Approximately 150 uniquefungal strains isolated using high-throughput cultivation (HTC)technology; these strains were screened for the ability to consumeAVICEL MCC, steam pretreated corn stover and dilute acid pretreated cornstover. The enrichment was set up in such a way that the sole source ofcarbon was the cellulosic substrate; hence growth was dependent upon theability of the fungus to digest the polymeric substrates tometabolizeable sugars. Trichoderma reesei RutC30 (from ATCC) was used asthe benchmark. Seventeen fungal strains outperformed T. reesei byconsuming PCS in a shorter period of time. Genomic DNA and cDNA wasisolated from the 10 best strains to be used as sources for novelcellobiohydrolase genes.

(CBH II) and Family 7 (CBH I) Cellobiohydrolases.

PCR of genomic DNA and cDNA using the degenerate oligonucleotide primersresulted in the isolation of 59 partial unique cellobiohydrolase genes,41 family 7 and 18 family 6. From the 59 partial genes 55 were recoveredas full length. A partial phylogenetic analysis of the catalytic domainsfrom these enzymes showed that there are a number of enzymes that arequite dissimilar to the known cellobiohydrolases.

The invention provides methods for expressing fungal cellobiohydrolase,including heterologous expression of these genes in fungal systems.Eukaryotic expression systems used to practice the invention include:(1) a yeast, e.g., a Pichia, e.g., Pichia pastoris; and/or, (2) afungus, e.g., a Cochliobolus, e.g., Cochliobolus heterostrophus. Pichiaexpression results in secretion into the culture broth; systems arecommercially available. 37 CBH genes were subcloned into Pichia vectors.Activity was detected in the exemplary SEQ ID NO:450 (encoded, e.g., bySEQ ID NO:449) enzyme (Family 6 homolog) in both small (microtiterplate) and large (30 L fermentor) scale. SDS-PAGE analysis (asillustrated in FIG. 75) and activity assays (FIG. 76) of culture brothsfrom the 30 L fermentor showed substantial accumulation of protein andactivity. In one aspect, these genes, including the coding sequence forthe exemplary SEQ ID NO:450 (encoded, e.g., by SEQ ID NO:449) enzyme,are subcloned into the Cochliobolus expression system. For the SDS-PAGE(FIG. 75) and activity assays (FIG. 76) on samples removed from afermentor of SEQ ID NO:450 (encoded, e.g., by SEQ ID NO:449), activitywas measured on PASC.

Digestion of PCS (Enzyme Cocktails)

As discussed above, the invention provides various mixtures, or“cocktails”, of enzymes for biomass conversion, including cocktailscomprising enzymes of the invention (in whole or in part). Combinationsof the invention, including enzymes of the invention, e.g., cellulasesand hemicellulases, can result in both synergistic and additive effectsand yield higher levels of conversion than possible with single enzymes.

In one aspect of the methods of the invention, the hemicellulosefraction of a biomass is removed because removal of the hemicellulosefraction reveals previously inaccessible cellulose that would becomedigestible. The exemplary endoglucanases SEQ ID NO:106 (encoded, e.g.,by SEQ ID NO:105) or SEQ ID NO:308 (encoded, e.g., by SEQ ID NO:307)were combined with the exemplary xylanase SEQ ID NO:444 (encoded, e.g.,by SEQ ID NO:443) (individual activity of this enzyme is presentedelsewhere herein). FIG. 77 shows that the presence of the exemplary SEQID NO:444 (encoded, e.g., by SEQ ID NO:443) in the reaction mix enhancesthe rate of glucose release by SEQ ID NO:106 (encoded, e.g., by SEQ IDNO:105) but has little effect on the overall extent. On the other hand,the exemplary SEQ ID NO:444 (encoded, e.g., by SEQ ID NO:443) enhancesboth the rate and extent of glucose release by the exemplary SEQ IDNO:308 (encoded, e.g., by SEQ ID NO:307) (SEQ ID NO:444 (encoded, e.g.,by SEQ ID NO:443) alone did not result in any glucose release). In thisaspect, the exemplary SEQ ID NO:444 (encoded, e.g., by SEQ ID NO:443)makes the substrate more accessible to these two endoglucanases.Interestingly, at least in these assays, the exemplary SEQ ID NO:106(encoded, e.g., by SEQ ID NO:105) has some amount of xylanase activitywhereas SEQ ID NO:308 (encoded, e.g., by SEQ ID NO:307) did not, perhapsexplaining the difference in performance between the two enzymes inthese studies. Another observation is that in both cases the extentsaturates in the 6-7 mM range of glucose (approximately 12% conversion),suggesting that this is the maximum level of conversion attainable by anendoglucanase alone. FIG. 77 graphically illustrates data showing theeffect of cellulose hydrolysis by combining the exemplary xylanase SEQID NO:444 (encoded, e.g., by SEQ ID NO:443) with the exemplaryendoglucanase SEQ ID NO:106 (encoded, e.g., by SEQ ID NO:105) (FIG. 77A)or SEQ ID NO:308 (encoded, e.g., by SEQ ID NO:307) (FIG. 77B). Thesubstrate was high severity alkPCS (2.2% solids) at 50° C., pH 5. Enzymeconcentrations were 1 mg/ml each.

Combinatorial work was extended to include other enzyme types,specifically the exemplary β-glucosidase SEQ ID NO:264 (encoded, e.g.,by SEQ ID NO:263) and one or more cellobiohydrolases (T. reesei CBHI andII). FIGS. 78A and 78B shows that the cellobiohydrases were able toenhance the digestion of alkPCS, with CBHI being more effective thanCBHII. This combination of enzymes reached approximately 55% conversion,compare with approximately 12% with the exemplary SEQ ID NO:106(encoded, e.g., by SEQ ID NO:105) alone. Further inspection of the datashows that all enzymes were required to reach this high level ofconversion and that the effect was synergistic rather than additive.Thus, the invention provides an enzyme cocktail comprising the exemplaryβ-glucosidase SEQ ID NO:264 (encoded, e.g., by SEQ ID NO:263) and one ormore cellobiohydrolase(s).

FIG. 78 graphically illustrates data showing the effect of cellulosehydrolysis using an enzyme mixture of the invention: made by combiningthe exemplary xylanase of the invention SEQ ID NO:444 (encoded, e.g., bySEQ ID NO:443), the exemplary endoglucanase of the invention SEQ IDNO:106 (encoded, e.g., by SEQ ID NO:105), the exemplary β-glucosidase ofthe invention SEQ ID NO:264 (encoded, e.g., by SEQ ID NO:263), and CBHI(FIG. 78A) or CBHII (FIG. 78B). The substrate was high severity alkPCS(2.2% solids) at 50° C., pH 5. Enzyme concentrations were 1 mg/ml eachfor SEQ ID NO:444 (encoded, e.g., by SEQ ID NO:443), SEQ ID NO:106(encoded, e.g., by SEQ ID NO:105) and SEQ ID NO:264 (encoded, e.g., bySEQ ID NO:263), and 0.05 mg/mL each for CBH I and CBH II.

Hemicellulase Characterization

The invention also provides polypeptides having hemicellulase activity,and methods of using them, e.g., in processing biomass. Initialdiscovery comprised surveying a number of hemicellulases in order to:(1) release hemicellulose sugar monomers and (2) enhance the activity ofthe cellulases by uncovering additional active sites. Over 200xylanases, including Family 10 and 11, were tested. Based oncharacterization data two enzymes of the invention, SEQ ID NO:444(encoded, e.g., by SEQ ID NO:443) and SEQ ID NO:100 (encoded, e.g., bySEQ ID NO:99), were chosen for further characterization on hemicelluloseprocessing. The exemplary SEQ ID NO:444 (encoded, e.g., by SEQ IDNO:443) was assayed for the ability to digest the hemicellulosecomponent of high severity alkPCS. FIG. 79 shows the time courses forthree different enzyme loadings (0.2, 1 and 5 mg/ml based on totalprotein in a crude lysate) using 2.2% solids (approx. 0.6% xylan) at 50°C. and pH 5 in 5 ml. Products (xylose and xylobiose) were monitored byHPLC-RI and the data were converted to “xylose equivalents” bymultiplying xylobiose concentrations by 2. There is an initial very fast(<6 hrs) release of xylose and xylobiose and then an extended slowerphase reaching saturation at between 40 and 50% conversion. Higherenzyme loading resulted in increased rate of conversion but noadditional increase in extent. Preliminary experiments suggested thatsaturation was not due to enzyme instability but more than likely due tosubstrate inaccessibility. It is possible that limited reactivity is dueto branch points in hemicellulose that the exemplary SEQ ID NO:444(encoded, e.g., by SEQ ID NO:443) can not digest through. Thus, in oneaspect, additional enzymes are added in this process such that arabinoseand glucuronic acid groups are cleaved (the invention providing enzymecocktails comprising the exemplary SEQ ID NO:444 (encoded, e.g., by SEQID NO:443) and arabinose- and glucuronic acid group-cleaving enzymes).Reactions of the exemplary SEQ ID NO:444 (encoded, e.g., by SEQ IDNO:443) with medium and low severity alkPCS showed a similar patternwith approximately the same amount of product release. In one aspect,provides enzyme cocktails comprising the exemplary SEQ ID NO:444(encoded, e.g., by SEQ ID NO:443) and endoglucanase, β-glucosidaseand/or cellobiohydrolase enzymes; see discussion, above.

The following chart summarizes activity of exemplary enzymes of theinvention (including the concentration of sugar, “[sugar]”, from AVICEL®MCC in a 24 hr reaction, mM, activity on pretreated corn stover (PCS),acid pretreated corn stover (PCS), and High severity alk-PCS (%conversion)):

[sugar] from AVICEL ® High severity MCC in 24 hr Steam acid alk-PCS (%Enzyme reaction, mM PCS PCS conversion) SEQ ID NO: 434 (encoded, e.g.,by SEQ ID NO: 433) 1.79 SEQ ID NO: 156 (encoded, e.g., by SEQ ID NO:155) 1.67 Y (1.5) SEQ ID NO: 308 (encoded, e.g., by SEQ ID NO: 307) 1.59Y Y (2.3) SEQ ID NO: 318 (encoded, e.g., by SEQ ID NO: 317) 1.51 Y Y Y(3.1) SEQ ID NO: 372 (encoded, e.g., by SEQ ID NO: 371) 1.18 SEQ ID NO:314 (encoded, e.g., by SEQ ID NO: 313) 1.02 Y (1.4) SEQ ID NO: 302(encoded, e.g., by SEQ ID NO: 301) 0.94 Y (1.9) SEQ ID NO: 106 (encoded,e.g., by SEQ ID NO: 105) 0.84 Y Y Y (8.8) SEQ ID NO: 120 (encoded, e.g.,by SEQ ID NO: 119) 0.83 Y SEQ ID NO: 126 (encoded, e.g., by SEQ ID NO:125) 0.73 Y SEQ ID NO: 110 (encoded, e.g., by SEQ ID NO: 109) 0.67 SEQID NO: 146 (ENCODED BY SEQ ID NO: 145) 0.66 Y Y (2.3) SEQ ID NO: 354(ENCODED BY SEQ ID NO: 353) 0.62 SEQ ID NO: 160 (ENCODED BY SEQ ID NO:159) 0.59 Y (5.5) SEQ ID NO: 176 (ENCODED BY SEQ ID NO: 175) 0.56 SEQ IDNO: 236 (ENCODED BY SEQ ID NO: 235) 0.56 SEQ ID NO: 246 (ENCODED BY SEQID NO: 245) 0.52 SEQ ID NO: 216 (ENCODED BY SEQ ID NO: 215) 0.51 SEQ IDNO: 296 (ENCODED BY SEQ ID NO: 295) 0.51 SEQ ID NO: 256 (ENCODED BY SEQID NO: 255) 0.49 Y (4.2) SEQ ID NO: 186 (ENCODED BY SEQ ID NO: 185) 0.48Y (5) SEQ ID NO: 124 (ENCODED BY SEQ ID NO: 123) 0.48 SEQ ID NO: 162(ENCODED BY SEQ ID NO: 161) 0.48 SEQ ID NO: 270 (ENCODED BY SEQ ID NO:269) 0.46 SEQ ID NO: 276 (ENCODED BY SEQ ID NO: 275) 0.45 SEQ ID NO: 190(ENCODED BY SEQ ID NO: 189) 0.45 SEQ ID NO: 274 (ENCODED BY SEQ ID NO:273) 0.45 SEQ ID NO: 214 (ENCODED BY SEQ ID NO: 213) 0.44 SEQ ID NO: 290(ENCODED BY SEQ ID NO: 289) 0.44 SEQ ID NO: 306 (ENCODED BY SEQ ID NO:305) 0.42 SEQ ID NO: 118 (ENCODED BY SEQ ID NO: 117) 0.42 SEQ ID NO: 30(encoded by SEQ ID NO: 29) 0.42 SEQ ID NO: 144 (ENCODED BY SEQ ID NO:143) 0.41 SEQ ID NO: 134 (ENCODED BY SEQ ID NO: 133) 0.4 SEQ ID NO: 194(ENCODED BY SEQ ID NO: 193) 0.39 SEQ ID NO: 318 (ENCODED BY SEQ ID NO:317) SEQ ID NO: 210 (ENCODED BY SEQ ID NO: 209) 0.39 SEQ ID NO: 240(ENCODED BY SEQ ID NO: 239) 0.38 Y Y (2) SEQ ID NO: 278 (ENCODED BY SEQID NO: 277) 0.37 SEQ ID NO: 294 (ENCODED BY SEQ ID NO: 293) 0.37 SEQ IDNO: 170 (ENCODED BY SEQ ID NO: 169) 0.37 SEQ ID NO: 208 (encoded by SEQID NO: 207) 0.37 SEQ ID NO: 128 (ENCODED BY SEQ ID NO: 127) 0.36 SEQ IDNO: 132 (ENCODED BY SEQ ID NO: 131) 0.36 SEQ ID NO: 158 (ENCODED BY SEQID NO: 157) 0.36 SEQ ID NO: 178 (ENCODED BY SEQ ID NO: 177) 0.36 SEQ IDNO: 166 (ENCODED BY SEQ ID NO: 165) 0.34 SEQ ID NO: 196 (ENCODED BY SEQID NO: 195) 0.34 Y Y (8.2) SEQ ID NO: 204 (ENCODED BY SEQ ID NO: 203)0.34 SEQ ID NO: 218 (ENCODED BY SEQ ID NO: 217) 0.33 Y SEQ ID NO: 242(ENCODED BY SEQ ID NO: 243) 0.33 SEQ ID NO: 154 (ENCODED BY SEQ ID NO:153) 0.29 Y (5.6) SEQ ID NO: 300 (ENCODED BY SEQ ID NO: 299) 0.28 SEQ IDNO: 338 (ENCODED BY SEQ ID NO: 337) 0.27 SEQ ID NO: 284 (ENCODED BY SEQID NO: 283) 0.27 SEQ ID NO: 112 (ENCODED BY SEQ ID NO: 111) 0.27 SEQ IDNO: 224 (ENCODED BY SEQ ID NO: 223) 0.27 SEQ ID NO: 136 (ENCODED BY SEQID NO: 135) 0.26 SEQ ID NO: 430 (ENCODED BY SEQ ID NO: 429) 0.26 SEQ IDNO: 198 (ENCODED BY SEQ ID NO: 197) 0.25 Y Y SEQ ID NO: 428 (ENCODED BYSEQ ID NO: 427) 0.25 Y Y (2.6) SEQ ID NO: 282 (ENCODED BY SEQ ID NO:281) 0.25 SEQ ID NO: 268 (ENCODED BY SEQ ID NO: 267) 0.25 SEQ ID NO: 152(ENCODED BY SEQ ID NO: 151) 0.24 Y Y (5.7) SEQ ID NO: 38 (ENCODED BY SEQID NO: 37) 0.23 SEQ ID NO: 292 (ENCODED BY SEQ ID NO: 291) 0.23 SEQ IDNO: 232 (ENCODED BY SEQ ID NO: 231) 0.23 SEQ ID NO: 234 (ENCODED BY SEQID NO: 233) 0.22 SEQ ID NO: 122 (ENCODED BY SEQ ID NO: 121) 0.22 SEQ IDNO: 142 (ENCODED BY SEQ ID NO: 141) 0.22 SEQ ID NO: 244 (ENCODED BY SEQID NO: 243) 0.21 SEQ ID NO: 138 (ENCODED BY SEQ ID NO: 137) 0.21 SEQ IDNO: 200 (ENCODED BY SEQ ID NO: 199) 0.21 SEQ ID NO: 116 (ENCODED BY SEQID NO: 115) 0.19 Y Y (4.2) SEQ ID NO: 114 (ENCODED BY SEQ ID NO: 113)0.19 SEQ ID NO: 248 (ENCODED BY SEQ ID NO: 247) 0.19 SEQ ID NO: 360(ENCODED BY SEQ ID NO: 359) 0.18 SEQ ID NO: 184 (ENCODED BY SEQ ID NO:183) 0.18 SEQ ID NO: 192 (ENCODED BY SEQ ID NO: 191) 0.18 SEQ ID NO: 222(ENCODED BY SEQ ID NO: 221) 0.17 SEQ ID NO: 140 (ENCODED BY SEQ ID NO:139) 0.16 SEQ ID NO: 168 (ENCODED BY SEQ ID NO: 167) 0.14 Y SEQ ID NO:182 (ENCODED BY SEQ ID NO: 181) 0.13 SEQ ID NO: 220 (ENCODED BY SEQ IDNO: 219) 0.13 SEQ ID NO: 260 (ENCODED BY SEQ ID NO: 259) 0.13 SEQ ID NO:262 (ENCODED BY SEQ ID NO: 261) 0.12 Y Y (0.4) SEQ ID NO: 280 (ENCODEDBY SEQ ID NO: 279) 0.12 SEQ ID NO: 258 (ENCODED BY SEQ ID NO: 257) 0.12SEQ ID NO: 108 (ENCODED BY SEQ ID NO: 107) 0.1 SEQ ID NO: 206 (ENCODEDBY SEQ ID NO: 205) 0.1 SEQ ID NO: 130 (ENCODED BY SEQ ID NO: 129) 0.08SEQ ID NO: 138 (ENCODED BY SEQ ID NO: 137) 0.08 SEQ ID NO: 286 (ENCODEDBY SEQ ID NO: 285) 0.08 SEQ ID NO: 316 (ENCODED BY SEQ ID NO: 315) 0.07Y (3) SEQ ID NO: 296 (ENCODED BY SEQ ID NO: 295) 0.07 SEQ ID NO: 288(ENCODED BY SEQ ID NO: 287) 0.07 SEQ ID NO: 202 (ENCODED BY SEQ ID NO:201) 0 Y Y Y (4.9) SEQ ID NO: 174 (ENCODED BY SEQ ID NO: 173) 0 Y SEQ IDNO: 238 (ENCODED BY SEQ ID NO: 237) 0 Y (1.3) SEQ ID NO: 416 (ENCODED BYSEQ ID NO: 415) 0 Y (1.7)

Example 12 Optimizing Content of Enzyme Cocktails of the Invention

This example describes the discovery and development of enzymes, andmixes (“cocktails”) of cellulolytic enzymes, for the hydrolysis ofbiomass, e.g., plant products, such as pretreated corn stover. Thisexample describes the development and integration of a physical/chemicalpretreatment of corn stover/fiber with an enzymatic hydrolysis (using a“cocktail” mix of enzymes of the invention) of complex polysaccharidesto fermentable sugars. The invention provides enzymes and enzyme mixesfor hydrolysis and saccharification of pretreated biomass. In oneaspect, the invention provides enzymes and enzyme mixes usingendoglucanases, cellobiohydrolases and/or β-glucosidases for biomassprocessing, e.g., saccharification of cellulose, for example, inpretreated stover. In one aspect, the enzyme cocktails of the inventionare used to generate commercial cellulase and hemicellulase products,e.g., pretreated corn materials.

Cellobiohydrolase and β-Glucosidase Discovery

The invention provides enzymes that can take the products of anendoglucanase reaction on cellulosic substrates and convert them intomonomer glucose. β-glucosidases were discovered, subcloned, expressedand characterized using assays designed to detect cellobiohydrolases andβ-glucosidases. In summary, 89 active β-glucosidases and 28 activecellobiohydrolases were characterized.

Discovery was a combination of activity-based screens using modelsubstrates (dye labeled sugars) and sequence-based discovery usingprobes designed from conserved sequences of known family 6 and 7cellobiohydrolases. Discovery resulted in 16 new β-glucosidase geneswhile 77 genes. These genes were subcloned into appropriate expressionvectors and characterized for activity on dye-labeled substrates,cellobiose and cellohexaose. Both pH and temperature optima weredetermined for each enzyme. In total 93 genes were analyzed foractivity. Of the 93 subclones, 89 were shown to be active on the dyelabeled substrate, pNP-β-glucopyranoside, as illustrated in FIG. 80 andthe table (list), below, under a broad range of conditions; FIG. 80graphically illustrates data showing pH and temperature optima of thescreened β-glucosidases. This summarizes β-glucosidases of the inventionwith pH_(opt), T_(opt) and specific activity on the substratepNP-β-glucopyranoside:

Enzyme pH_(opt) T_(opt) SA (U/mg) SEQ ID NO: 254 (encoded by SEQ ID NO:253) 5 60 2.25 SEQ ID NO: 264 (encoded by SEQ ID NO: 263) 5 80 43.93 SEQID NO: 340 (encoded by SEQ ID NO: 339) 7 60 10.86 SEQ ID NO: 364(encoded by SEQ ID NO: 363) 5 60 3.83 SEQ ID NO: 356 (encoded by SEQ IDNO: 355) 6 60 0.89 SEQ ID NO: 326 (encoded by SEQ ID NO: 325) 5 37 14.94SEQ ID NO: 358 (encoded by SEQ ID NO: 357) 7 60 4.06 SEQ ID NO: 320(encoded by SEQ ID NO: 319) 6 60 2.75 SEQ ID NO: 346 (encoded by SEQ IDNO: 345) 6 37 0.43 SEQ ID NO: 348 (encoded by SEQ ID NO: 347) 6 60 0.264SEQ ID NO: 362 (encoded by SEQ ID NO: 361) 6 80 3 SEQ ID NO: 342(encoded by SEQ ID NO: 341) 6 60 3.5 SEQ ID NO: 336 (encoded by SEQ IDNO: 335) 7 37 0.00728 SEQ ID NO: 352 (encoded by SEQ ID NO: 351) 6 8013.5 SEQ ID NO: 304 (encoded by SEQ ID NO: 303) 5 60 0.5 SEQ ID NO: 322(encoded by SEQ ID NO: 321) 6 37 8.02 SEQ ID NO: 432 (encoded by SEQ IDNO: 431) 6 60 0.7 SEQ ID NO: 226 (encoded by SEQ ID NO: 225) 6 37 0.185SEQ ID NO: 228 (encoded by SEQ ID NO: 227) 5 60 0.31 SEQ ID NO: 312(encoded by SEQ ID NO: 311) 7 37 0.38 SEQ ID NO: 370 (encoded by SEQ IDNO: 369) 5 37 0.21 SEQ ID NO: 404 (encoded by SEQ ID NO: 403) 5 37 0.229SEQ ID NO: 420 (encoded by SEQ ID NO: 419) 6 37 2.883 SEQ ID NO: 400(encoded by SEQ ID NO: 399) 6 37 2.369 SEQ ID NO: 384 (encoded by SEQ IDNO: 383) 7 37 0.88 SEQ ID NO: 24 (encoded by SEQ ID NO: 23) 8 37 2.743SEQ ID NO: 42 (encoded by SEQ ID NO: 41) 5 37 1.57 SEQ ID NO: 408(encoded by SEQ ID NO: 407) 6 37 23.083 SEQ ID NO: 382 (encoded by SEQID NO: 381) 6 60 1.82 SEQ ID NO: 228 (encoded by SEQ ID NO: 227) 6 376.77 SEQ ID NO: 344 (encoded by SEQ ID NO: 343) 5 37 0.0339 SEQ ID NO:332 (encoded by SEQ ID NO: 331) 5 37 0.492 SEQ ID NO: 150 (encoded bySEQ ID NO: 149) 6 80 4.26 SEQ ID NO: 230 (encoded by SEQ ID NO: 229) 637 2.699 SEQ ID NO: 310 (encoded by SEQ ID NO: 309) 7 37 0.963 SEQ IDNO: 94 (encoded by SEQ ID NO: 93) 6 60 164.026 SEQ ID NO: 6 (encoded bySEQ ID NO: 5) 5 37 0.263 SEQ ID NO: 298 (encoded by SEQ ID NO: 297) 5 370.172 SEQ ID NO: 376 (encoded by SEQ ID NO: 375) 5 37 0.489 SEQ ID NO:148 (encoded by SEQ ID NO: 147) 5 37 0.24 SEQ ID NO: 386 (encoded by SEQID NO: 385) 5 37 0.25 SEQ ID NO: 350 (encoded by SEQ ID NO: 349) 5 370.172 SEQ ID NO: 18 (encoded by SEQ ID NO: 17) 5 37 0.346 SEQ ID NO: 50(encoded by SEQ ID NO: 49) 5 37 0.619 SEQ ID NO: 424 (encoded by SEQ IDNO: 423) 6 37 10.263 SEQ ID NO: 422 (encoded by SEQ ID NO: 421) 5 370.178 SEQ ID NO: 8 (encoded by SEQ ID NO: 7) 5 37 0.0879 SEQ ID NO: 212(encoded by SEQ ID NO: 211) 8 37 0.228 SEQ ID NO: 366 (encoded by SEQ IDNO: 365) 8 80 0.052 SEQ ID NO: 380 (encoded by SEQ ID NO: 379) 5 370.336 SEQ ID NO: 58 (encoded by SEQ ID NO: 57) 5 37 0.0455 SEQ ID NO: 58(encoded by SEQ ID NO: 57) 5 37 0.0181 SEQ ID NO: 388 (encoded by SEQ IDNO: 387) 6 60 168 SEQ ID NO: 4 (encoded by SEQ ID NO: 3) 6 37 0.506 SEQID NO: 76 (encoded by SEQ ID NO: 75) 5 60 0.73 SEQ ID NO: 90 (encoded bySEQ ID NO: 89) 5 60 12.6 SEQ ID NO: 328 (encoded by SEQ ID NO: 327) 6 600.16 SEQ ID NO: 334 (encoded by SEQ ID NO: 333) 5 60 3.09 SEQ ID NO: 16(encoded by SEQ ID NO: 15) 6 60 1.08 SEQ ID NO: 30 (encoded by SEQ IDNO: 29) 8 37 36.6 SEQ ID NO: 374 (encoded by SEQ ID NO: 373) 5 37 0.027SEQ ID NO: 394 (encoded by SEQ ID NO: 393) 6 60 1.91 SEQ ID NO: 330(encoded by SEQ ID NO: 329) 7 37 12.3 SEQ ID NO: 164 (encoded by SEQ IDNO: 163) 8 60 0.35 SEQ ID NO: 378 (encoded by SEQ ID NO: 377) 5 37 0.033SEQ ID NO: 410 (encoded by SEQ ID NO: 409) 5 37 0.29 SEQ ID NO: 418(encoded by SEQ ID NO: 417) 5 37 0.02 SEQ ID NO: 70 (encoded by SEQ IDNO: 69) 6 37 0.77 SEQ ID NO: 412 (encoded by SEQ ID NO: 411) 5 37 0.12SEQ ID NO: 398 (encoded by SEQ ID NO: 397) 6 60 2.26 SEQ ID NO: 272(encoded by SEQ ID NO: 271) 6 37 1.49 SEQ ID NO: 324 (encoded by SEQ IDNO: 323) 7 37 2.31 SEQ ID NO: 172 (encoded by SEQ ID NO: 171) 5 60 1.97SEQ ID NO: 188 (encoded by SEQ ID NO: 187) 6 80 7.06 SEQ ID NO: 250(encoded by SEQ ID NO: 249) 6 80 15.35 SEQ ID NO: 252 (encoded by SEQ IDNO: 251) 6 80 11.21 SEQ ID NO: 180 (encoded by SEQ ID NO: 179) 5 37 0.03SEQ ID NO: 368 (encoded by SEQ ID NO: 367) 5 37 0.1 SEQ ID NO: 266(encoded by SEQ ID NO: 265) 7 37 0.04 SEQ ID NO: 414 (encoded by SEQ IDNO: 413) 5 37 0.071 SEQ ID NO: 390 (encoded by SEQ ID NO: 389) 5 37 0.01SEQ ID NO: 402 (encoded by SEQ ID NO: 401) 6 37 10.6 SEQ ID NO: 426(encoded by SEQ ID NO: 425) 7 37 25.7 SEQ ID NO: 392 (encoded by SEQ IDNO: 391) 6 80 44 SEQ ID NO: 396 (encoded by SEQ ID NO: 395) 6 37 5.7 SEQID NO: 406 (encoded by SEQ ID NO: 405) 5 37 0.17 SEQ ID NO: 438 (encodedby SEQ ID NO: 437) 5 37 0.2 SEQ ID NO: 436 (encoded by SEQ ID NO: 435) 637 0.004 SEQ ID NO: 492 (encoded by SEQ ID NO: 491) 6 37 2.5

The activity of the exemplary enzymes of the invention SEQ ID NO:264(ENCODED BY SEQ ID NO:263), SEQ ID NO:94 (encoded by SEQ ID NO:93) andSEQ ID NO:388 (encoded by SEQ ID NO:387) was tested on cellobiose andcellohexaose. SEQ ID NO:94 (ENCODED BY SEQ ID NO:93) and SEQ ID NO:388(encoded by SEQ ID NO:387) both were significantly more active oncellohexaose than on cellobiose while SEQ ID NO:264 (ENCODED BY SEQ IDNO:263) had almost equivalent activity on these two substrates. TheK_(m) for cellobiose of SEQ ID NO:264 (ENCODED BY SEQ ID NO:263) wasdetermined to be approximately 2.5 mM; consistent with literature valuesfor other similar enzymes. Based on these results the exemplary SEQ IDNO:264 (ENCODED BY SEQ ID NO:263) was chosen as the top candidate to beused in the enzyme cocktails. During a biomass to ethanol process highconcentrations of glucose are expected to accumulate and the potentialfor product inhibition exists. Experiments were designed to test forproduct inhibition of the exemplary SEQ ID NO:264 (ENCODED BY SEQ IDNO:263). FIG. 81 shows the results which indicated that indeed theexemplary SEQ ID NO:264 (ENCODED BY SEQ ID NO:263) is inhibited by highconcentrations of glucose. FIG. 81 graphically illustrates data showingglucose inhibition of the exemplary enzyme SEQ ID NO:264 (ENCODED BY SEQID NO:263). Hydrolysis of Res-β-glucopyranoside was monitored inincreasing concentrations of added glucose. Glucose concentrationsranged from 0 to 60 mM at the start of the reaction. Solid lines arefits to the Michaelis-Menten equation with increasing K_(m) values.

Discovery of cellobiohydrolases of the invention was a combination ofactivity and sequence based approaches. The major focus was on fungalgenes since the available literature suggests that, in general, fungalcellobiohydrolases are more active than their bacterial counterparts. 41full-length fungal genes were discovered using sequence based discovery,and these genes were expressed and characterized. All 41 were clonedinto Pichia and the filamentous fungus Cochliobolus heterostrophus; twowere shown to be active in the Pichia constructs while 28 were active inthe Cochliobolus constructs. Activity was measured on AVICEL®microcrystalline cellulose (MCC) and phosphoric acid swollen cellulose(PASC).

FIG. 82 graphically illustrates data showing the range of PASChydrolysis activity in all 41 strains of the invention, as compared tothe wildtype controls (C5 and MelKO). Wildtype endogenous cellulase andhemicellulase activity is repressed when grown in a xylose or glucosemedium (CMX contains xylose as the carbon source). In these assays theexemplary strains SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) and SEQ IDNO:450 (ENCODED BY SEQ ID NO:449) (both family 6 CBH) had the highestactivity after a 3 day growth in the 24 well plates. FIG. 82 graphicallyillustrates data showing digestion of phosphoric acid swollen cellulose(PASC) by recombinant C. heterostrophus strains comprising nucleic acidsof the invention (encoding enzymes of the invention). C5 is the wildtype strain, MKO is a strain with the melanin locus knocked-out and CMXis the growth medium. Secreted protein was incubated with PASC for 2 hrsat 50 C, pH 5. The amount of product produced was measured by theaddition of b-glucosidase, glucose oxidase, horseradish peroxidase andAmplex Red.

Several family 7 and family 6 containing strains were selected andgrowth was scaled up to 500 mL shake flask. As seen in FIG. 83, activityis dependent upon number of days in the shake flask and tends to varyfrom strain to strain; in FIG. 83A, PASC activity of 5 different family6 CBH containing strains during growth in 500 mL shake flasks; FIG. 83B,PASC activity of 4 different family 7 CBH containing strains duringgrowth in 500 mL shake flasks.

The cellobiohydrolases were isolated from the exemplary enzymes of theinvention SEQ ID NO:98 (ENCODED BY SEQ ID NO:97), SEQ ID NO:452 (ENCODEDBY SEQ ID NO:451) and SEQ ID NO:34 (ENCODED BY SEQ ID NO:33) usingeither an affinity chromatography matrix (cellobiose based) or classicalsize exclusion chromatography. Proteomics analysis confirmed that theisolated protein was indeed the correct enzyme. Expression levels wereestimated to be approximately mg active enzyme/L culture broth. Theseisolated proteins were used in the enzyme cocktails as described herein.

Digestion of PCS Using Enzyme Cocktails of the Invention

The invention provides enzyme cocktails to process biomass; for example,exemplary enzyme cocktails of the invention, when incubated with anappropriate pretreated biomass feedstock, have the following performancecharacteristics: in 48 hrs release 75% and 40% of theoretical glucoseand xylose, respectively, using 5% solids and 20 mg/g cellulose.

Enzymes discovered (as discussed, above) were combined in such a way asto obtain the highest level of conversion of pretreated biomass. Anumber of pretreated biomass samples were used in the evaluationincluding low and high severity pretreated corn stover as well aspretreated cob samples (“Jaygo”). HPLC methods were utilized to monitorsugar release. Percent conversion was calculated based on compositionalanalysis of the pretreated material. In order to achieve theseperformance targets several different classes of enzymes were combinedin appropriate ratios. These cocktails are referred to as “EX” where “X”is the number of enzymes combined. Performance was monitored bycontacting the various enzyme cocktails with a pretreated biomass sampleand measuring sugars released into the liquid phase. FIG. 84 shows theprogression of percent conversion as different enzymes of the inventionwere combined; and the figure describes exemplary enzyme mixtures of theinvention, e.g., E10, E9, etc.; the figure graphically illustrates theimprovement in glucose and xylose conversion as enzymes of the inventionare added to the cocktail.

Cocktail performance (of enzyme mixes of the invention) was assessed ondifferent pretreated materials (which vary in feedstock (stover vs. cob)and pretreatment characteristics, e.g., high, medium and low severity)and compared the cocktail performance to the performance of SPEZYME®cellulase at 2 different enzyme loadings (15 and 60 FPU/g cellulose).These data are shown in FIG. 85. In all cases the cocktails of theinvention (enzyme mixes of the invention) outperformed the lowerSPEZYME® cellulase loading and outperformed both low and high loadingsof SPEZYME® cellulase for xylose and arabinose conversion. FIG. 85graphically illustrates digestion of pretreated biomass feedstocks bySPEZYME® enzyme (15 and 60 FPU) and the exemplary enzyme mix of theinvention designated “E9”, by showing the amount of sugar released at 48hrs (FIG. 85A, glucose released; FIG. 85B, xylose released; FIG. 85C,arabinose released). The horizontal line in each figure representsapproximately 50% theoretical conversion. LPCS: low severity alkPCS;LePCS: extended time LPCS; MPCS1: medium severity condition 1 (140° F.,15% NH₃); MPCS2: medium severity condition 2 (170° C., 5% NH₃); HPCS,high severity alkPCS; cob, ammonia soaked cob, Jaygo 1.

The Trichoderma enzymes, CBH I and CBHII were used in the aboveexemplary cocktails of the invention; however, any cellobiohydrolasescan be used. For example, FIG. 86 graphically illustrates data showingthat the exemplary enzymes of the invention SEQ ID NO:34 (ENCODED BY SEQID NO:33) and SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) can replace T.reesei CBH I and II in cocktails of the invention, e.g., the exemplaryenzyme mix of the invention designated “E8”. FIG. 86 graphicallyillustrates data showing glucose release from Jaygo 2 (5 wt %) duringincubation with the exemplary “E8” cocktail supplemented with either T.reesei CBH I and II or SEQ ID NO:34 (ENCODED BY SEQ ID NO:33) and SEQ IDNO:98 (ENCODED BY SEQ ID NO:97).

Protein Purification and Quantitation

To quantify the amount of active enzyme used in the exemplary cocktailsof the invention, it was necessary to purify each enzyme and determinespecific activity of the pure (or enriched) protein. These data werethen used to estimate the level of active protein in the protein samplesused in the cocktails. For six of the enzymes crude cell free extractswere used in the cocktails, hence we back calculated the amount ofenzyme in the crude mixtures based on the purified activity. However forthe remaining two exemplary enzymes of the invention SEQ ID NO:98(ENCODED BY SEQ ID NO:97) and SEQ ID NO:34 (ENCODED BY SEQ ID NO:33),the partially purified samples were actually used in the cocktails soestimates of active enzyme were based solely on SDS-PAGE analysis.

List of Enzymes Purified:

-   -   SEQ ID NO:264 (ENCODED BY SEQ ID NO:263): β-glucosidase    -   SEQ ID NO:106 (ENCODED BY SEQ ID NO:105): endoglucanase    -   SEQ ID NO:100 (ENCODED BY SEQ ID NO:99): family 11 xylanase    -   SEQ ID NO:102 (ENCODED BY SEQ ID NO:101): family 10 xylanase    -   SEQ ID NO:96 (ENCODED BY SEQ ID NO:95): β-xylosidase    -   SEQ ID NO:92 (ENCODED BY SEQ ID NO:91): α-arabinofuranosidase    -   SEQ ID NO:98 (ENCODED BY SEQ ID NO:97): family 6        cellobiohydrolase    -   SEQ ID NO:34 (ENCODED BY SEQ ID NO:33): family 7        cellobiohydrolase

A summary of the purification results are:

% active Enzyme enzyme SEQ ID NO: 264 (ENCODED BY SEQ ID NO: 263) 5.6SEQ ID NO: 106 (ENCODED BY SEQ ID NO: 105) 3 SEQ ID NO: 100 (ENCODED BYSEQ ID NO: 99) 15.5 SEQ ID NO: 102 (ENCODED BY SEQ ID NO: 101) 18.7 SEQID NO: 96 (ENCODED BY SEQ ID NO: 95) 2.7 SEQ ID NO: 92 (ENCODED BY SEQID NO: 91) 34 SEQ ID NO: 98 (ENCODED BY SEQ ID NO: 97) 46 SEQ ID NO: 34(ENCODED BY SEQ ID NO: 33) 20 Tr CBH I 87 Tr CBH II 51This table lists enzymes of the invention (by their “SEQ ID NO:”designations) used in biomass degrading cocktails and estimated percentof active enzyme in crude preparations.

Enzymatic Digestion of Pretreated Biomass

Enzymes, and enzyme cocktails, of the invention, are used to processbiomass, as demonstrated by their effectiveness in processing pretreatedbiomass samples, e.g., the test sample “Jaygo 2”. The composition of“Jaygo 2” is:

Ratio Theoretical 100% Percent (liquid/ conversion (5% solids)composition solid) Total (g/L) (mM) Glucan 42.9 0.010 43.33 21.67 120.37Xylan 31.22 0.084 33.85 16.93 112.84

Each reaction was sampled at various time points and productconcentration was determined by HPLC-RI. These values were used tocalculate percent conversion during enzymatic hydrolysis; thecomposition of Jaygo 2 and theoretical concentration of glucose andxylose after 100% conversion of 5% solids reaction,

In order to directly compare the performance of enzymes of the inventionto a commercial benchmark, SPEZYME® cellulase performance was testedunder the same conditions. The standard dosage of SPEZYME® cellulase of15 FPU/g cellulose is equivalent to 58 mg protein/g cellulose. In thefollowing experiments 7.5 FPU cellulase (29 mg) was combined with theprotein equivalent of MULTIFECT® xylanase for a total of 58 mg/gcellulose. FIG. 87 shows the data obtained using 5% solids (“Jaygo 2”,see above) in both absolute concentration and percent conversion. FIG.88 shows the data set for 10% solids. FIG. 87 graphically illustratesdata showing digestion of Jaygo 2 (5% solids) using 7.5 FPU/g celluloseSPEZYME® cellulase plus 7.5 “FPU equivalents”/g cellulose MULTIFECT®xylanase (in total 58 mg/g cellulose). Percent conversion was based on120 mM glucose and 113 mM xylose as 100%. FIG. 88 graphicallyillustrates data showing digestion of Jaygo 2 (10% solids) using 7.5FPU/g cellulose SPEZYME® cellulase plus 7.5 “FPU equivalents”/gcellulose MULTIFECT® xylanase (in total 58 mg/g cellulose). Percentconversion was based on 240 mM glucose and 226 mM xylose as 100%. Basedon this data, performance is:

Performance SPEZYME ® Benchmark Spezyme ® Parameters enzyme enzyme mgactive enzyme/g 58 58 cellulose Glucose: 75 73 % Conversion Glucose:Time for 48 48 conversion (hr) Xylose: 59 57 % Conversion Xylose: Timefor 48 48 conversion (hr) % Solids 5 10

The cocktail of 10 enzymes of the invention, designated “E10”, has veryhigh biomass saccharification activity. In this cocktail of theinvention, four of the enzymes are responsible for digesting cellulosewhile the remainder are active on hemicellulose. As described above, acombination of protein purification, SDS-PAGE analysis and enzyme assaysallowed a quantitative measure of the amount of active enzyme in each ofthe crude preparations. In order to reduce overall protein used insaccharification reactions a systematic approach was undertaken toremove redundant and unnecessary enzymes from the exemplary “E10”cocktail of the invention. It was determined that 2 of the enzymes, theexemplary SEQ ID NO:442 (ENCODED BY SEQ ID NO:441) (a α-glucuronidase)and the exemplary SEQ ID NO:440 (ENCODED BY SEQ ID NO:439) (a ferulicacid esterase) contributed very little to overall performance and wereremoved from the cocktail resulting in an E8 mixture. Experiments werecarried out to determine which of the cellobiohydrolases (CBH I, CBH II,the exemplary SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) and/or theexemplary SEQ ID NO:34 (ENCODED BY SEQ ID NO:33)) were the mosteffective. The performance from three different mixes was assessed; seediscussion, above. In all three cases the total enzyme composition wastabulated and was below the 20 mg/g cellulose limit outlined in thetarget (case 1=18.4 mg/g; case 2=19.2 mg/g and case 3=17.2 mg/g).

FIGS. 89 and 90 show the time courses of saccharification of “Jaygo 2”(see above) using the three enzyme mixes. While there were some minordifferences in rates between the cases all three resulted in almostexactly 80% recovery of glucose and 62% recovery of xylose within 48hrs. FIG. 89 graphically illustrates data showing glucose release fromJaygo 2 (5% solids) catalyzed by three different exemplary enzyme mixesof the invention, or “E8” cocktails: CBH I/CBH II is Case I; CBH I/SEQID NO:98 (ENCODED BY SEQ ID NO:97) is Case 2, and the exemplary enzymesof the invention SEQ ID NO:34 (ENCODED BY SEQ ID NO:33)/SEQ ID NO:98(ENCODED BY SEQ ID NO:97) is Case 3. Glucose concentration wasdetermined by HPLC analysis of the saccharified liquors sampled at 4,20, 30 and 48 hrs. Percent conversion was calculated by using 120 mM as100% available glucose in the pretreated solids. Reaction conditions arepH 5.5 and 50° C. FIG. 90 graphically illustrates data showing xyloserelease from Jaygo 2 (5% solids) catalyzed by three different exemplaryenzyme mixes of the invention, or “E8” cocktails: CBH I/CBH II; CBHI/SEQ ID NO:98 (ENCODED BY SEQ ID NO:97), and the exemplary enzymes ofthe invention SEQ ID NO:34 (ENCODED BY SEQ ID NO:33)/SEQ ID NO:98(ENCODED BY SEQ ID NO:97). Xylose concentration was determined by HPLCanalysis of the saccharified liquors sampled at 4, 20, 30 and 48 hrs.Percent conversion was calculated by using 113 mM as 100% availablexylose in the pretreated solids. Reaction conditions are pH 5.5 and 50°C. The performance of the exemplary enzyme mixes of the invention, or“E8” cocktails, compared to SPEZYME® cellulase is tabulated below:

Performance Benchmark SPEZYME ® Case Case Case Parameters Cellulase 1 23 mg active enzyme/g 58 18.4 19.2 17.2 cellulose Glucose: 75 76 79 76 %Conversion Glucose: Time for 48 48 48 48 conversion (hr) Xylose: 59 5758 59 % Conversion Xylose: Time for 48 <20 <20 <20 conversion (hr) %Solids 5 5 5 5

In summary, E8 outperformed SPEZYME® cellulase/MULTIFECT® xylanase (rateand extent) with approximately one-third the amount of protein/gcellulose.

Enzymes of the Invention for Higher Solids Saccharification

The invention also provides compositions (enzyme cocktails) forprocessing biomass solids loadings higher than 5%, but with low enzymecontent, and methods for processing biomass solids loadings higher than5%, but with low enzyme content. The performance of enzyme cocktails ofthe invention at 10% solids was evaluated, where the amount of proteinin the cocktails was reduced from 20 mg/g cellulose to approximately 12mg/g cellulose. Initial experiments were performed at enzyme loadingssimilar to the standard SPEZYME® cellulase/MULTIFECT® xylanase mixtures(58 mg protein/g cellulose). Under these reaction conditions theexemplary “E9” enzyme cocktail of the invention reached 74% and 70%conversion for glucose and xylose, respectively. FIG. 91 graphicallyillustrates data showing the digestion of Jaygo 2 (10% solids) using 58mg “E9”/g cellulose; percent conversion was based on 240 mM glucose and226 mM xylose as 100%. These data are shown in FIG. 91 and summarized inthis table, showing the performance characteristics of an exemplary “E9”enzyme cocktail of the invention at 58 mg/g cellulose loading and 10%solids:

Performance Parameters E9 cocktail mg active enzyme/g 58 celluloseGlucose: 74 % Conversion Glucose: Time for 48 conversion (hr) Xylose: 71% Conversion Xylose: Time for 48 conversion (hr) % Solids 10

The next goal was to decrease protein dosage to approximately 12 mg/gcellulose. Four different recipes for the exemplary “E9” enzyme cocktailof the invention were used, altering the hemicellulase and cellulaseratios. FIG. 92 shows the amount of xylose and glucose released at 36and 48 hrs for each situation. FIG. 92 graphically illustrates the levelof conversion of glucose (G1) and xylose (X1) using 10% solids Jaygo 2and a number of enzyme recipes that vary in cellulase and hemicellulasecontent. Cellulose hydrolysis was sensitive to both the cellulase andhemicellulase concentrations (a synergy between the enzyme types)whereas hemicellulose hydrolysis (as measured by xylose release) wassensitive only to hemicellulase content. Under these conditions xyloseconversion is maintained at about 60% at 36 hrs while glucose conversiondrops to approximately 50% as compared to performance at a higher enzymeloading.

A systematic study was undertaken in order to clarify the interplaybetween biomass solids content and enzyme loading. Reactions were set upwith 18 mg protein/g cellulose and 9 mg protein/g cellulose at 1%, 5%and 10% Jaygo 2. FIGS. 93A and 93B summarize the data and show thatglucose release was much more sensitive to solids loading than xylose,as a matter of fact at the high enzyme load (18 mg/g) there was almostno difference in xylose yield between the different percent solids inthe reactor. Possible explanations for the decrease in performance assubstrate concentration increases are (1) product inhibition by glucose,xylose, cellobiose or xylobiose (2) mass transfer (mixing) deficienciesor (3) a combination of both. FIG. 93A graphically illustrates percentglucose conversion at 48 hrs using different enzyme mixes of theinvention (the “E8” cocktail) and solids (Jaygo 2) loadings; FIG. 93Bgraphically illustrates percent xylose conversion at 48 hrs usingdifferent enzyme mixes of the invention (the “E8” cocktail) and solids(Jaygo 2) loadings.

A summary of the performance of these exemplary enzyme cocktails of theinvention, as compared to the performance target and measuredbenchmarks, is summarized:

Measured Exemplary Exemplary Exemplary Performance Bench- Bench- E8 E8E9 Parameters mark mark* Cocktail** Cocktail Cocktail mg active 20 5819.2 12 58 enzyme/ (15 FPU) g cellulose Glucose: 80 73 79 50 74 %conversion Glucose: 48 48 48 36 48 Conversion time (h) Xylose: 65 57 5862 70 % conversion Xylose: 48 48 <20 36 48 Conversion time (h) % solids2.5 10 5 10 10 7.5 FPU Spezyme cellulase plus 7.5 “FPU equivalents”MULTIFECT xylanase **Case 2

Hemicellulase Characterization

The invention provides enzymes, and enzyme mixes, or “cocktails”, thatare effective in processing, or hydrolyzing, plant hemicelluloses—whichare complex, branched molecules consisting of a main chain ofβ1,4-linked xylan decorated with of a variety of other sugars, such asarabinose, galactose and mannose, or on occasion xylan may also bedecorated with glucuronic acid, acetylated to a certain extent andlinked to lignin via ferulic acid ester or ether linkages; and inalternative aspects completely or partially degrade the hemicellulose tomonomer, or to intermediate oligomers and monomers. The enzyme mixes, or“cocktails”, of the invention are particularly effective becausehemicellulose is a much more complex material than cellulose andrequires mixes of enzymes to completely degrade to monomer.

The invention provides effective endo-xylanases that can quicklypartially degrade a hemicellulose, e.g., a hemicellulose in pretreatedbiomass, into smaller oligosaccharides. The endo-xylanase(s) of theinvention, or any endo-xylanase(s) used in the enzyme mixes, or“cocktails”, of the invention, can solubilize biomasses, e.g.,hemicellulose-comprising solids, and produce oligosaccharides on whichthe other hemicellulases will act. Two exemplary enzymes of theinvention are xylanases, the exemplary SEQ ID NO:444 (encoded, e.g., bySEQ ID NO:443) and SEQ ID NO:100 (encoded, e.g., by SEQ ID NO:99)enzymes, were demonstrated to be able to perform this function. Both ofthese enzymes are so-called family 11 endoxylanases.

Additional screening comprised screening about 250 xylanase enzymesalone or in combination with the exemplary SEQ ID NO:100 (encoded, e.g.,by SEQ ID NO:99) and/or SEQ ID NO:444 (encoded, e.g., by SEQ ID NO:443)enzymes for their ability to release reducing sugars from a pretreatedbiomass sample (low and high severity alkaline PCS). We found that theexemplary SEQ ID NO:100 (encoded, e.g., by SEQ ID NO:99) was a betterperformer than the exemplary SEQ ID NO:444 (encoded, e.g., by SEQ IDNO:443) (i.e., released slightly more reducing sugars at pH 5 and 50°C.) and that the addition of a family 10 endoxylanase increased thexylose yield.

Several candidates were found and the exemplary SEQ ID NO:102 (encoded,e.g., by SEQ ID NO:101) enzyme of the invention performed the best.β-xylosidases are responsible for the conversion of xylooligomers intoxylose monomer; eight β-xylosidases were screened for effectiveness onxylobiose at pH 5 and 50 C and one candidate was chosen, the exemplarySEQ ID NO:96 (encoded, e.g., by SEQ ID NO:95). Arabinofuranosidases (theexemplary SEQ ID NO:92 (encoded, e.g., by SEQ ID NO:91), SEQ ID NO:454(encoded, e.g., by SEQ ID NO:453) and SEQ ID NO:456 (encoded, e.g., bySEQ ID NO:455) enzymes of the invention) were also screened forenhancement of xylose release and release of arabinose. As can be seenin FIGS. 94A and 94B, the addition of arabinofuranosidase not onlyincreased the yield of arabinose from PCS but also allowed more xyloseto be released. FIG. 94 graphically illustrates xylose release (FIG.94A) and arabinose release (FIG. 94B) from low severity alkPCS (2.2%solids) by xylosidase (the exemplary SEQ ID NO:96 (encoded, e.g., by SEQID NO:95)); xylanase (the exemplary SEQ ID NO:444 (encoded, e.g., by SEQID NO:443)) and arabinofuranosidase (the exemplary SEQ ID NO:92(encoded, e.g., by SEQ ID NO:91)) (pH 5, 50° C.).

It was also found that the addition of a ferulic acid esterase, theexemplary SEQ ID NO:440 (encoded, e.g., by SEQ ID NO:439), andα-glucuronidase, the exemplary SEQ ID NO:442 (encoded, e.g., by SEQ IDNO:441), resulted in slightly higher xylose release from PCS. Therefore,SEQ ID NO:440 (encoded, e.g., by SEQ ID NO:439) and SEQ ID NO:442(encoded, e.g., by SEQ ID NO:441) enzymes were included in theseexemplary enzyme cocktails of the invention.

Analytical Characterization of Unhydrolyzed Oligomeric Xylan

In one aspect, the processes of the invention comprise initial digestionof insoluble polysaccharides, e.g., in biomasses, such as pretreatedcorn samples, to soluble oligosaccharides; and, in one aspect, to theirultimate conversion to monomeric sugars. A SPEZYME cellulase/MULTIFECTxylanase mixture can produce complex soluble oligosaccharides frompretreated corn cob samples. Acid hydrolysis of the saccharificationliquors resulted substantial increase in glucose and xylose monomer,indicating that the enzyme mixture is deficient in exoglycosidaseactivity. The exemplary enzyme mix of the invention, the so-called “E10cocktail”, on the other hand, produced a much simpler mix of solubleoligosaccharides.

FIG. 95 illustrates chromatograms of the results of using the exemplary“E10 cocktail” enzyme mix of the invention to digest Jaygo 2 (5% solids)after 48 hr incubation (FIG. 95A) and subsequent acid hydrolysis ofthose liquors (FIG. 95B). This table shows the concentrations of sugarrepresented in each chromatogram as well as the percent theoreticalconversion (based on approximately 118 mM as 100%):

Enzyme Post Acid Glucose 95 mM 104 mM (% theoretical) (81) (88) Xylose80 mM 103 mM (% theoretical) (68) (87) Mannose — 1.5 mM Galactose — 4 mMArabinose 11 mM 14 mM

Following acid hydrolysis the level of sugar release increased to almost90% indicating that a majority of the sugars were in the soluble formbut for some reason were not be completely degraded to monomer.

In order to aid in the discovery of enzymes that could act on therecalcitrant oligosaccharides, we set out to more carefully analyze thehydrolysate. Mass spectrometry was used to further characterize theoligomeric region of E10 and SPEZYME® cellulase/MULTIFECT® xylanasegenerated liquors. These data indicated that SPEZYME®cellulase/MULTIFECT® xylanase produced complex cello- and xylo-oligomerswhereas the exemplary “E10 cocktail” enzyme mix of the invention mainlyproduced a tetrameric oligomer of pentose sugars and a trimer of mixedpentose and hexose sugars. The exemplary E10 cocktail enzyme mixsaccharified liquors were fractionated using either the BioRad orSHODEX™ column (Thomson Instruments, Clear Brook, Va.) and samples werecollected for more detailed analysis. The oligomeric region was dividedseparately into 4 different fractions (Peak 1, 2, 2.5 and 3 in FIG. 96),which illustrates an HPLC of fractionated E10 saccharification liquors.

Capillary electrophoresis analysis of the individual samples, asillustrated in FIG. 96, and this table:

Peak 1 Peak 2 Peak 2.5 Peak 3 Xyl 47.5 60.9 35.1 19.3 Ara 26.0 27.1 11.94.6 Glu 15.6 7.9 48.3 72.0 Gal 10.9 4.2 4.7 4.1suggested that each peak contained a major oligosaccharide. Acidhydrolysis of these fractions indicated that peak 1 and 2 are composedof mainly arabinose and xylose while peak 2.5 and 3 are mainly glucose.LC-MS of each of these fractions concluded that peak 2 is consistentwith a tetramer of C-5 sugars. Coupled with the CE and HPLC of acidhydrolyzed fractions we concluded that the major component in theunhydrolyzed material was an arabinoxylan composed of 3 xylose and 1arabinose molecules or “AX₃”. FIG. 97 illustrates the results of acapillary electrophoresis of the fractionation of E10 enzymemix-digested saccharification liquors (upper panel). The lower panelcontains standard mono- and oligosaccharides. Percent xylose, arabinose,glucose and galactose in each peak as determined by the CE data in FIG.81.

Overexpression of Enzyme Encoding Genes of the Invention

In one aspect, filamentous fungi, such a fungi of the genusCochliobolus, e.g., the filamentous fungus Cochliobolus heterostrophus,are used as an expression system to express enzymes of the inventionbecause, for example, a filamentous fungus system can satisfy arequirement for glycosylation of cellobiohydrolases, if desired, andthey lack endogenous cellulase and hemicellulase activity. These enzymesystems of the invention can produce larger quantities of protein,including in one aspect secreting large quantities of enzymes of theinvention.

Studies of gene expression including mRNA analysis (single and globalgene analysis) and active protein detection can be used to optimizeexpression systems. RT-PCR can be used to monitor steady state mRNAlevels for specific genes during various times in the growth phase. Theamount can be normalized to the total RNA and can be compared to aconstitutively and highly expressed gene (e.g., EF1α). Message quantitycan be expressed in the “cycle number” or C_(t) value. Tables of theC_(t) values vs time are shown below; the exemplary enzymes of theinvention SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97), SEQ ID NO:34(encoded, e.g., by SEQ ID NO:33) and SEQ ID NO:446 (encoded, e.g., bySEQ ID NO:445) are three recombinant cellobiohydrolases integrated inthe melanin locus; and for these experiments the C. heterostrophusstrains were grown on xylose as the carbon source. C_(t) values of 19-20are considered to be highly expressed.

SEQ ID NO: 98 SEQ ID NO: 34 SEQ ID NO: 446 (encoded by (encoded by(encoded by SEQ ID NO: 97) SEQ ID NO: 33) SEQ ID NO: 445) Ct Ct Std CtCt Std Ct Ct Std Sample time Mean Dev Mean Dev Mean Dev CBH 24 19.5 0.0920.7 0.10 21.3 0.08 48 20.4 0.09 21.1 0.01 21.8 0.11 72 20.5 0.06 20.50.03 23.4 0.13 96 20.3 0.06 19.7 0.01 21.8 0.11 120 19.8 0.06 20.6 0.0420.6 0.04 EF1-α 24 19.1 0.59 19.0 0.14 19.7 0.04 48 19.8 0.12 19.7 0.0220.1 0.06 72 20.8 0.05 19.8 0.08 21.7 0.07 96 20.8 0.08 19.5 0.04 20.10.01 120 20.0 0.03 19.4 0.02 19.2 0.06

Message levels for the non-CBH genes are lower than the CBH genes asseen in the table below. As a point of reference a ΔCt of 1 correspondsto a 2-fold difference in messenger concentration. In particular,exemplary enzyme of the invention SEQ ID NO:106 (encoded, e.g., by SEQID NO:105) has a Ct number of about 26, much higher than the others.

Bacterial Enzymes EF1-alpha Ct Ct Std Ct Ct Std No. time Name Mean DevMean Dev 1 24 SEQ ID NO: 264 22.0 0.01 22.3 0.54 (encoded by SEQ ID NO:261) 2 72 SEQ ID NO: 264 23.0 0.05 22.2 0.16 (encoded by SEQ ID NO: 261)3 24 SEQ ID NO: 100 24.2 0.23 20.8 0.05 (encoded by SEQ ID NO: 99) 4 72SEQ ID NO: 100 25.7 0.26 22.0 0.04 (encoded by SEQ ID NO: 99) 5 24 SEQID NO: 96 23.1 0.06 20.5 0.08 (encoded by SEQ ID NO: 95) 6 72 SEQ ID NO:96 22.1 0.07 20.9 0.07 (encoded by SEQ ID NO: 95) 7 24 SEQ ID NO: 10626.4 0.11 20.6 0.19 (encoded by SEQ ID NO: 105) 8 72 SEQ ID NO: 106 26.50.07 21.9 0.02 (encoded by SEQ ID NO: 105) 9 24 SEQ ID NO: 92 21.5 0.0222.5 0.06 (encoded by SEQ ID NO: 91) 10 72 SEQ ID NO: 92 22.1 0.03 21.10.06 (encoded by SEQ ID NO: 91) 11 24 Mel-KO 21.3 0.02 12 72 Mel-KO 21.50.02

Use of Pectinases in Enzyme Cocktails of the Invention

The invention provides enzyme cocktails comprising pectinase enzymes,including enzymes of the invention having pectinase activity. Additionof pectinase to the exemplary enzyme mix of the invention, including the“E8 cocktail”, improved glucose yields by 5-6% from Jaygo 2 samples. Farless cellobiose accumulated during saccharification of 5 and 10% solids(Jaygo 2) when the exemplary β-glucosidase SEQ ID NO:94 (encoded, e.g.,by SEQ ID NO:93) was used in the cocktail instead of SEQ ID NO:264(encoded, e.g., by SEQ ID NO:263). When Cochliobolus is grown using abiomass sample as the carbon source instead of glucose approximately4000 genes are differentially expressed.

Enzyme stability to ethanol was tested by performing saccharificationreactions in the presence of 1, 2 and 5% ethanol. The exemplary enzymemix of the invention E8 cocktail (18 mg/g cellulose) was used with 10%solids (Jaygo 2). The table below shows the amount of glucose and xylosereleased at 48 hrs from each of the reactions. As the ethanolconcentration is increased there is an overall drop in sugar yields,though glucose is more affected than xylose.

EtOH % G1-48 hr X1-48 hr 0% 59 66 1% 57 65 2% 55 63 5% 49 63

The addition of pectinases (6 were tested) to the enzyme cocktailsimproved the yield of glucose from the saccharification of a Jaygo 2sample (5% solids). The level of improvement observed was 5-6% overcontrol reactions. Additional biochemistry was carried out to furthercharacterize the enzymes. In particular, SDS-PAGE analysis was used toquantify the amount of pectinase that was used in the mixtures. Asillustrated in FIG. 98, SDS-PAGE was done on the 6 pectinases tested inthe E8 cocktails. 5 ug total protein was used in each lane. Percentpurity was estimated based on densitometry.

At least in these assays, the most effective pectinase in the cocktailwas the exemplary SEQ ID NO:458 (encoded, e.g., by SEQ ID NO:457)enzyme; addition of this enzyme to the enzyme cocktail E8 resulted inthe greatest increase in glucose yield over the control (from 70% in thecontrol to 76% with SEQ ID NO:458 (encoded, e.g., by SEQ ID NO:457)within 48 hrs). FIG. 98 shows that the exemplary SEQ ID NO:458 (encoded,e.g., by SEQ ID NO:457) only constituted approximately 4% of the totalprotein, though it resulted in the highest increase in glucose.

Saccharified liquor samples were fractionated, producing enoughquantities of the recalcitrant oligosaccharides to screen. Acidhydrolysis of the fractionated material was consistent with previousdata. Six family 51 arabinofuranosidases were subcloned, expressed andshown to be active on model, dye-labeled substrates.

CBH and β-Glucosidase Optimization

The E8 enzyme cocktails containing the exemplary β-glucosidase SEQ IDNO:94 (encoded, e.g., by SEQ ID NO:93) accumulated less cellobiose(hence made more glucose) than cocktails containing the exemplary SEQ IDNO:264 (encoded, e.g., by SEQ ID NO:263). SDS-PAGE, as illustrated inFIG. 99, was used to normalize the protein amounts used in thosecocktails. The exemplary SEQ ID NO:94 (encoded, e.g., by SEQ ID NO:93)contained almost 10 fold more enzyme than SEQ ID NO:264 (encoded, e.g.,by SEQ ID NO:263), however less cellobiose accumulated when SEQ ID NO:94(encoded, e.g., by SEQ ID NO:93) was used in the cocktail implying thatthe enzyme either has a higher specific activity or is less inhibited byproduct. FIG. 99 illustrates an SDS-PAGE of the six β-glucosidasestested in the E8 cocktails. 5 ug total protein was used in each lane.Percent purity was estimated based on densitometry.

In one aspect, the enzyme expressing genes of the invention areoverexpressed, e.g., to improve protein secretion by the desiredexpression system, e.g., a fungal expression system, such as aCochliobolus system. Mutants can be generated by both chemical and UVmutagenesis. A screening system has been developed and validated.

Study of the expression patterns of endogenous Cochliobolus glycosylhydrolases also can be helpful in designing enzyme mixes of theinvention. Expression can be repressed by glucose and xylose and inducedby growth on biomass samples. Time courses of gene expression can begenerated using real time RT-PCR. Directed mutations can be made inspecific catabolite repressor and protease genes to study the effects onheterologous gene expression and protein production.

Very little performance loss is seen in the enzyme mix “E8” at 1% and 2%ethanol, though much greater loss was observed at 5% ethanol.Significant performance loss was observed at 60° C. mostly attributableto CBH instability. The exemplary SEQ ID NO:94 (encoded, e.g., by SEQ IDNO:93), a β-glucosidase; is expressed at very high levels in E. coli andat least in these assays is a better performer than SEQ ID NO:264(encoded, e.g., by SEQ ID NO:263); thus, SEQ ID NO:94 (encoded, e.g., bySEQ ID NO:93) has become a standard β-glucosidase (bG) in the cocktail.Strain development of Cochliobolus can develop a hypersecretor withimproved overall protein yields.

Regarding the impact of pectinase on cocktail performance, test resultsshowed that a small (5-6% glucose increase) improvement in performancewas obtained when pectinases were added to a 5% solids reaction.However, no obvious improvement was observed in reactions that contained10% solids; there was no apparent dose response. Our conclusion was thatthe addition of a pectinase to the current cocktails provides no realbenefit—at least with respect to these assay systems.

Five new family 51 arabinofuranosidases were subcloned: the exemplaryenzymes of the invention SEQ ID NO:460 (encoded, e.g., by SEQ IDNO:459), SEQ ID NO:462 (encoded, e.g., by SEQ ID NO:461), SEQ ID NO:464(encoded, e.g., by SEQ ID NO:463), SEQ ID NO:466 (encoded, e.g., by SEQID NO:465) and SEQ ID NO:468 (encoded, e.g., by SEQ ID NO:467). Thesefive enzymes tested positive for activity on 4-MU-arabinofuranoside.These five enzymes plus 11 family 51 arabinofuranosidases were testedfor their ability to degrade partially purified AX₃; thus, thisexemplary mix of the invention comprises: SEQ ID NO:470 (ENCODED, E.G.,BY SEQ ID NO:469), SEQ ID NO:472 (ENCODED, E.G., BY SEQ ID NO:471), SEQID NO:474 (ENCODED, E.G., BY SEQ ID NO:473), SEQ ID NO:476 (ENCODED,E.G., BY SEQ ID NO:475), SEQ ID NO:478 (ENCODED, E.G., BY SEQ IDNO:477), SEQ ID NO:480 (ENCODED, E.G., BY SEQ ID NO:479), SEQ ID NO:482(ENCODED, E.G., BY SEQ ID NO:481), SEQ ID NO:484 (ENCODED, E.G., BY SEQID NO:483), SEQ ID NO:486 (ENCODED, E.G., BY SEQ ID NO:485), SEQ IDNO:488 (ENCODED, E.G., BY SEQ ID NO:487) and SEQ ID NO:490 (ENCODED,E.G., BY SEQ ID NO:489). A saccharified liquor from a SPEZYMEcellulase/hemicellulase reaction was used. A chromatogram of theunfractionated mixture is shown in FIG. 100, an illustration of anHPLC-RI trace of the unfractionated saccharification liquors showing therecalcitrant oligosaccharides (F2), cellobiose (CB), glucose (G), xylose(X) and arabinose (A). The sample was treated with glucose oxidase toremove glucose and xylose monomer resulting in the sample, as shown inFIG. 101; an HPLC-RI trace of the fractionated saccharification liquorsshowing the recalcitrant oligosaccharides (F2), cellobiose (CB), glucose(G), xylose (X) and arabinose (A).

When this sample is treated with the exemplary enzymes of the inventionβ-glucosidase SEQ ID NO:94 (ENCODED, E.G., BY SEQ ID NO:93) andβ-xylosidase SEQ ID NO:96 (ENCODED, E.G., BY SEQ ID NO:95), glucoseappeared and major reduction in the “F2” region was observed (asillustrated in FIG. 102) indicating that this region of the chromatogramcontained cellooligomers of various chain lengths. The new F2 regionbecame very symmetrical and appeared to be a single species. FIG. 102illustrates an HPLC-RI trace of the sample shown in FIG. 101 with theexemplary enzymes SEQ ID NO:94 (ENCODED, E.G., BY SEQ ID NO:93) and SEQID NO:96 (ENCODED, E.G., BY SEQ ID NO:95). Four enzymes showed someactivity in degrading “fraction 2” (recalcitrant xylo-oligomers) andgenerating arabinose: the exemplary enzymes of the invention SEQ IDNO:482 (ENCODED, E.G., BY SEQ ID NO:481), SEQ ID NO:478 (ENCODED, E.G.,BY SEQ ID NO:477), SEQ ID NO:484 (ENCODED, E.G., BY SEQ ID NO:483), SEQID NO:486 (ENCODED, E.G., BY SEQ ID NO:485).

Treatment of the sample shown in FIG. 102 with the exemplaryarabinofuranosidase SEQ ID NO:482 (ENCODED, E.G., BY SEQ ID NO:481)resulted in almost complete loss of the of the peak at approx. 10.5minutes and subsequent increase in xylose and arabinose, as illustratedin FIG. 103, an HPLC-RI trace of the sample shown in FIG. 102 with theexemplary arabinofuranosidase SEQ ID NO:482 (ENCODED, E.G., BY SEQ IDNO:481). The conclusion from these data is that the family 51arabinofuranosidase SEQ ID NO:482 (ENCODED, E.G., BY SEQ ID NO:481) incombination with the exemplary enzymes of the invention β-xylosidase SEQID NO:96 (ENCODED, E.G., BY SEQ ID NO:95) and β-glucosidase SEQ ID NO:94(ENCODED, E.G., BY SEQ ID NO:93) is able to digest the majority of therecalcitrant oligosaccharides to monomer glucose and xylose.

In one aspect, to optimize expression systems, chemical (EMS) and UVmutagenesis of cell systems, e.g., fungal systems, such as Cochliobolus,can be done. Approximately 1400 mutants were screened for increasedxylanase and β-glucosidase activity. Close to 60 hits were observed ofwhich approximately 10 were reconfirmed after secondary screening.

Targeted deletion of catabolite repressor genes have resulted in anincrease in protein production. Alternative secretion signals can betested with enzymes of the invention, e.g., heterologous secretionsignals (in addition to or with heterologous leader or signal sequences)can be spliced onto enzymes of the invention, e.g., the exemplary CBHgenes SEQ ID NO:34 (ENCODED, E.G., BY SEQ ID NO:33) and SEQ ID NO:98(ENCODED, E.G., BY SEQ ID NO:97).

In summary, sixteen family 51 arabinofuranosidases were tested for theability to digest the recalcitrant xylo-oligomers (“fraction 2”) fromthe saccharified liquors. The exemplary enzyme of the invention SEQ IDNO:482 (ENCODED, E.G., BY SEQ ID NO:481) in combination with theβ-glucosidase (bG) SEQ ID NO:94 (ENCODED, E.G., BY SEQ ID NO:93) and bXSEQ ID NO:96 (ENCODED, E.G., BY SEQ ID NO:95) converted “fraction 2”into xylose, glucose and arabinose, though at a relatively slow rate.Thus, in one aspect, a family 3 carbohydrate binding domain is appendedto the exemplary endoglucanase SEQ ID NO:106 (ENCODED, E.G., BY SEQ IDNO:105).

In one aspect, the invention provides methods for modifying thesequences of exemplary enzymes of the invention with a goal of, e.g.,modifying activity, such as increasing activity under specificenvironmental conditions, such as high temperature or pH, high saltconditions, high substrate concentrations, e.g., by evolution using GeneSite Saturation Mutagenesis (GSSM, discussed above) technology, e.g., ofthe exemplary enzyme of the invention CBH SEQ ID NO:34 (ENCODED, E.G.,BY SEQ ID NO:33).

Additional enzymes were tested for the ability to degrade “fraction 2”.One enzyme in particular, SEQ ID NO:104 (ENCODED, E.G., BY SEQ IDNO:103) in combination with the bG SEQ ID NO:94 (ENCODED, E.G., BY SEQID NO:93) and the bX SEQ ID NO:96 (ENCODED, E.G., BY SEQ ID NO:95),converted nearly all the AX₃ present in fraction 2 to xylose andarabinose, as illustrated in FIG. 104. FIG. 104 illustrates an HPLCanalysis of the digestion of fractionated soluble oligomers (AX₃) by theexemplary enzymes of the invention: SEQ ID NO:104 (ENCODED, E.G., BY SEQID NO:103), a family 62 arabinofuranosidase; SEQ ID NO:96 (ENCODED,E.G., BY SEQ ID NO:95) (bX); and, SEQ ID NO:94 (ENCODED, E.G., BY SEQ IDNO:93) (bG). In FIG. 104, the top panel is substrate only and the bottompanel is after 14 hr enzyme incubation. The identities of each peak areshown above (F2: fraction 2, CB: cellobiose, glucose, X: xylose and A:arabinose).

The enzyme SEQ ID NO:104 (ENCODED, E.G., BY SEQ ID NO:103), a family 62arabinofuranosidase (a Cochliobolus enzyme), was expressed in Pichiapastoris. This enzyme SEQ ID NO:104 (ENCODED, E.G., BY SEQ ID NO:103)was used to process an unfractionated hydrolysate. Though almostcomplete conversion took place, the rate was much slower than thereaction described above for FIG. 104 (degrading “fraction 2” with SEQID NO:104 (ENCODED, E.G., BY SEQ ID NO:103); SEQ ID NO:96 (ENCODED,E.G., BY SEQ ID NO:95) (bX); and, SEQ ID NO:94 (ENCODED, E.G., BY SEQ IDNO:93)). In order to investigate whether the decreased rate was due tohigh concentrations of monomer sugars (glucose and xylose) in thehydrolysate, the fractionated material was spiked with either glucose orxylose or both at concentrations equivalent to what is found in theoriginal material. Under these conditions we observed equivalentperformance to the fractionated material, suggesting that glucose andxylose in and of themselves did not inhibit SEQ ID NO:104 (ENCODED,E.G., BY SEQ ID NO:103), and that some other compound(s) must be thesource of inhibition.

Both glucose and xylose yields from Jaygo 2 (5% solids) benefited fromthe addition of the arabinofuranosidase SEQ ID NO:104 (ENCODED, E.G., BYSEQ ID NO:103) (1 mg/ml protein concentration) to the E8 cocktail of theinvention (18 mg enzyme/g, T. reesei CBH I and H). Chromatograms stillshowed the presence of some fraction 2.

Glucose Xylose (% at 48 hrs) (% at 48 hrs) E8 77 66 E8 + SEQ ID NO: 10482 70 (encoded, e.g., by SEQ ID NO: 103)

Interestingly when the fractionated oligomeric sugars were incubatedwith the E8 cocktail which contained the cellobiohydrolases of theinvention SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33) and SEQ ID NO:98(encoded, e.g., by SEQ ID NO:97), we observed almost complete conversionof fraction 2 to glucose and xylose, as illustrated in the HPLC dataplot of FIG. 105. FIG. 105 illustrates data showing digestion offractionated soluble oligomers (AX₃) by the E8 cocktail that containedthe exemplary enzymes SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33) andSEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97). The top panel issubstrate only and the bottom panel is after 14 hr enzyme incubation.The identities of each peak are shown above (F2: fraction 2, CB:cellobiose, G: glucose, X: xylose and A: arabinose).

Further experimentation revealed that the SEQ ID NO:34 (encoded, e.g.,by SEQ ID NO:33) preparation contained the activity responsible for theconversion. SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33) is aheterologously expressed CBH produced in Cochliobolus. This particularpreparation is an ammonium sulfate fractionation, therefore, it containsother enzymes secreted by Cochliobolus. Proteomics analysis of thesample showed that it contained at least 37 other proteins. Included are2 xylanases, 1 ferulic acid esterase, 1 xylosidase and 2arabinofuranosidases. Thus, one or more of these proteins most likely isresponsible for the observed effect.

Endoglucanase (EG) Optimization

Several variants of SEQ ID NO:106 (encoded, e.g., by SEQ ID NO:105),also called EG1, have been constructed; these include the exemplaryenzymes of the invention EG1_CD, EG1_CDDDED, EG1_CDCBM3 (for example,EG1_CDDDED comprises SEQ ID NO:106 (encoded, e.g., by SEQ ID NO:105)with an esterase domain, a carbohydrate binding domain):

Clone Description EG1 EG Catalytic domain, dockerin domains* (originalEG in E8) EG1_CD EG Catalytic domain EG1_CDDDED EG Catalytic domain,dockerin domains, esterase domain EG1_CDCBM3 EG Catalytic domain, CBM3*dockerin domains are cohesin domains on cellulosomal scaffoldingproteins, see, e.g., Leibovitz (1996) J. Bacteriol. 178: 3077-3084;Leibovitz (1997) J. Bacteriol. 179(8): 2519-2523.

SDS-PAGE analysis showed that expression levels of each of the variantswere higher than the original construct, SEQ ID NO:106 (encoded, e.g.,by SEQ ID NO:105). The performance of each in the E8 cocktail wasassessed. Enzyme loadings were normalized based on estimated expressionlevels from SDS-PAGE analysis. The table below details the yields ofglucose and xylose after 48 hr reaction on Jaygo 2 and a loading of 18mg/g cellulose (5% solids). The data shown in the table are the averageof 2 independent experiments with standard deviations≦2%:

Variant % Glucose % Xylose SEQ ID NO: 106 (encoded, e.g., 74 64 by SEQID NO: 105) EG1_CD 74 64 EG1_CDDDED 77 67 EG1_CDCBM3 81 70

The data suggest that appending a carbohydrate binding domain to theendoglucanase had a positive effect on the yield of glucose.

Overexpression Systems

The enzymes of the invention can be expressed using any host cell. Hostcell systems can be optimized to generate overexpression/high expressionsystems. Deletion of the catabolite repression gene, creA, in aCochliobolus host cell appeared to have a positive impact on theexpression level of the exemplary enzyme SEQ ID NO:34 (encoded, e.g., bySEQ ID NO:33) (50 mL shake flask). PASC hydrolysis by secreted proteinwas much higher in the Cochliobolus creA mutant than in the Cochlioboluscontrols and the other Cochliobolus strains, as illustrated in FIG. 106.This reaction can also be done in 30 L fermenters. FIG. 106 graphicallyillustrated the hydrolysis of PASC by secreted enzyme SEQ ID NO:34(encoded, e.g., by SEQ ID NO:33) of various gene knockouts.

In summary, the Cochliobolus enzyme SEQ ID NO:104 (encoded, e.g., by SEQID NO:103) (family 62 arabinofuranosidase) in combination with bG SEQ IDNO:94 (encoded, e.g., by SEQ ID NO:93) and bX SEQ ID NO:96 (encoded,e.g., by SEQ ID NO:95) converted the recalcitrant xylo-oligosaccharidesinto xylose, glucose and arabinose. Addition of SEQ ID NO:104 (encoded,e.g., by SEQ ID NO:103) to the E8 cocktail showed an increase in thelevel of glucose and xylose monomer. Furthermore a contaminatingprotein(s) in the SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33) (CBH I)preparation also converted the recalcitrant material into monomer sugar.Several variations of CBM—endoglucanase (EG), e.g., SEQ ID NO:106(encoded, e.g., by SEQ ID NO:105), combinations were constructed and hada positive affect on the release of glucose from Jaygo 2. At least inthese assays, the best construct was EG1_CDCBM3 with a “family 3”cellulose-binding module (CBM) appended to the C-terminus of thecatalytic domain. In one aspect, enzyme sequences of the invention are“evolved” by a sequence mutation technology, e.g., GSSM technology; forexample, the cellobiohydrolase SEQ ID NO:34 (encoded, e.g., by SEQ IDNO:33) sequence is modified using GSSM.

Hemicellulases

FIG. 107 compares the product profiles of the exemplary enzyme mix ofthe invention “E8” comprising T. reesei CBH I and II to the exemplaryenzyme mix of the invention “E8” comprising the SEQ ID NO:34 (encoded,e.g., by SEQ ID NO:33) and SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97)enzymes of the invention. The major differences were seen in theoligomer region and in the amount of xylose and arabinose monomersproduced. FIG. 107 illustrates the product profile from a 48 hrsaccharification of Jaygo2 by the exemplary enzyme mix of the invention“E8” comprising T. reesei CBH I and II and SEQ ID NO:34 (encoded, e.g.,by SEQ ID NO:33) and SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97).

The combination of the exemplary SEQ ID NO:34 (encoded, e.g., by SEQ IDNO:33), SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97) and the family 62ARF SEQ ID NO:104 (encoded, e.g., by SEQ ID NO:103) in the E8 cocktailresulted in almost 80% xylose as monomer and 72% glucose as monomer in24 hrs, as illustrated in FIG. 108. FIG. 108 graphically illustratesenzyme progress curves comparing E8 cocktails (SEQ ID NO:34 (encoded,e.g., by SEQ ID NO:33)/SEQ ID NO:98 (encoded, e.g., by SEQ ID NO:97))with or without SEQ ID NO:104 (encoded, e.g., by SEQ ID NO:103).Reaction conditions: 5% solids (Jaygo 2), pH 5.5, 50° C., 18.9 mgenzyme/g cellulose.

Endoglucanase (EG) Optimization

The chimeric enzyme of the invention EG1_CDCBM3 (comprising the EG SEQID NO:106 (encoded, e.g., by SEQ ID NO:105) and a CBM) was more activeon AVICEL™ microcrystalline cellulose than SEQ ID NO:106 (encoded, e.g.,by SEQ ID NO:105).

The product profile of the exemplary cocktail “E8” (CBH I, CBHII)cocktail containing EG1_CDCBM3 instead of SEQ ID NO:106 (encoded, e.g.,by SEQ ID NO:105) showed less dimeric sugars (cellobiose/xylobiose) andmore monomers (glucose/xylose) than the exemplary “E8” cocktail with SEQID NO:106 (encoded, e.g., by SEQ ID NO:105), as illustrated by the HPLCdata plot of FIG. 109. However there was essentially no difference inthe amount of total glucose (G1+G2) between the two. Depending on thedesired hydrolysis result, in some aspects, this allows the opportunityto reduce the amount of the β-glucosidase (bG, or βG) SEQ ID NO:94(encoded, e.g., by SEQ ID NO:93) in the cocktail.

CBH and β-Glucosidase Optimization

GSSM evolved mutants of the exemplary CBH enzyme SEQ ID NO:34 (encoded,e.g., by SEQ ID NO:33) were sequenced, confirming that the processproduced appropriate coverage at each of four sites (within themutagenized SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33)):

# Sent to # Amino Sequencing # Analyzed Acids # of Codons Site #1 96 6117 21 Site #2 96 71 16 23 Site #3 96 70 15 17 Site #4 120 81 17 23

A microtiter plate-based high throughput assay can be used to assayenzymes of the invention, and their variants and mixtures (“cocktails”),using, e.g., PASC (crystalline cellulose that is made more amorphousthrough swelling by acid treatment) and/or pretreated biomass (xylanasepredigested, designated “HT3”) as a substrate.

Overexpression of Candidate Genes

Chemical mutagenesis of Cochliobolus resulted in a large number of newstrains that have improved secreted xylanase and b-glucosidase (with4-methylumbelliferryl cellobioside as substrate) activities. This tablesummarizes the results of multiple rounds of mutagenesis and screening:

Primary Secondary Set Plates Primary Assay Hits Hits Mut. 16 Xylanase +4Mu-Cell 62 14 Trial 1 23 Xylanase 54 4 2 48 Xylanase 89 22 3 50Xylanase 41 12 4 50 Xylanase + 4Mu-Cell 22 10 5 48 Xylanase + 4Mu-Cell131 26 6 47 Xylanase + 4Mu-Cell 66 21 7 50 Xylanase + 4Mu-Cell 135 —  8*48 Xylanase + 4Mu-Cell — —

In summary, the exemplary CBH preparation comprising the SEQ ID NO:34(encoded, e.g., by SEQ ID NO:33) enzyme of the invention contains anactivity that converts the recalcitrant oligosaccharides into xylose andarabinose. The glycosyl hydrolase family 62 arabinofuranosidase improvesthe yield of glucose when added to the “E8” cocktail. A new cocktailcontaining SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33), SEQ ID NO:98(encoded, e.g., by SEQ ID NO:97) and SEQ ID NO:104 (encoded, e.g., bySEQ ID NO:103) resulted in approx. 80% xylose monomer and 72% glucosemonomer at a loading of 18.9 mg/g cellulose (5% solids, Jaygo 2).Addition of a family 3 CBM to EG SEQ ID NO:106 (encoded, e.g., by SEQ IDNO:105) (designated EG1_CDCBM3) improved the performance on AVICEL®microcrystalline cellulose and in the cocktail on Jaygo 2. GSSM was usedto “evolve” SEQ ID NO:34 (encoded, e.g., by SEQ ID NO:33), and theactivity of these mutants was validated in the fungal expression hostCochliobolus. Chemical mutagenesis on Cochliobolus was effected with thegoal of improving levels of secreted protein.

Hemicellulases

Three arabinoxylan oligomers were obtained to act as standards for themajor soluble recalcitrant compound. These were AX2 and two forms ofAX3, one with the arabinose linked to the central xylose and the otherwith the arabinose linked to the non-reducing xylose (all α-1,3).Capillary electrophoresis and ¹³C NMR analysis was performed on thesesamples and compared to our isolated compound, as illustrated in FIGS.110 and 111. The major compound isolated from saccharified liquors hasan electromigration time equivalent to AX3 with the arabinose attachedto the non-reducing end of the xylan backbone; however, the NMR spectraof the 2 molecules are different. This pattern may explain why a GH51ARF appears to be inactive on our compound. FIG. 110 illustrates theresults of a capillary electrophoresis of APTS labeled arabinoxylanfragments, where #1, 2, and 3 are standard molecules while #5 and 6 aremolecules isolated from saccharified liquors. The signal at about 4minutes is free APTS. FIG. 111 illustrates the results of a ¹³C NMRspectra of arabinoxylan fragments.

CBH and β-Glucosidase Optimization

488 out of 491 residues of SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33)were mutagenized using GSSM technology (described above). Close to halfwere transformed into E. coli, and come transformed into Cochliobolus.Alternative assay substrates are AVICEL microcrystalline cellulose, PASCor xylanase predigested HT3 (×HT3). Normalization to active enzyme inthe culture broth is important in assay development. An ELISA basedtechnique can also be used.

Overexpression of Candidate Genes

74 chemically mutagenized mutants of Cochliobolus were grown in mediumcontaining 1% Jaygo for 4 days and supernatant was collected forxylanase and beta-glucosidase (with 4-methylumbelliferryl cellobiosideas substrate) activities. Dry cell weight of each mutant was determinedat day 4.

11 Cochliobolus mutants had higher beta-glucosidase activity than thewild type Cochliobolus by mean+3 standard deviation (SD) and more than26 mutants had the activity higher than the wild type by mean+2SD, seeFIG. 112. FIG. 113 illustrates secreted protein activity (against4-MU-cellobioside) of 74 mutagenized Cochliobolus strains. The strainswere grown for 4 days in 24 well plates using 1% Jaygo 2. Activity wasnormalized to dry cell weight in each well. Wild type is labeled “C5”.Aspergillus cloning vectors can also be used.

In summary, standard arabinoxylans were used to get a betterunderstanding of the nature of recalcitrant oligosaccharides resultingfrom enzymatic digestion of pretreated cob samples. Capillaryelectrophoreses and ¹³C-NMR showed that the recalcitrant material isdifferent than known standards. GSSM technology was used to “evolve” theexemplary enzyme of the invention SEQ ID NO:34 (encoded by, e.g., SEQ IDNO:33) (a CBH). Assay development focus is on the choice of substrateand method of detection. A number of chemically mutagenized strains ofCochliobolus secrete more protein than the wild type strain.

Hemicellulases

As noted above, it was found that a contaminating enzyme activity withinthe SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) preparation canconvert a recalcitrant soluble oligosaccharides present in the standardsaccharification liquors to monomeric sugars. This activity was purifiedfrom the crude SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) sample(discussed above) using a combination of deglycosylation andchromatography. The protein was identified by SDS-PAGE and proteomics,as illustrated in FIG. 113. Based on the proteomics analysis it wasconcluded that the protein responsible for the hydrolysis was a nativeCochliobolus GH family 3 enzyme. The gene is currently being subclonedand will be expressed in Pichia and Aspergillus. FIG. 113 is anillustration of an SDS-PAGE showing the purification of the enzymeactivity responsible for the hydrolysis of recalcitrantoligosaccharides. Lane (1): starting material, (2) active fraction afterFPLC, (3) EndoH treated sample 2, (4-12) protein fractions following asecond FPLC, (7-9) the most active fractions. The box shows the mostcommon protein band in lanes 7-9, this band was excised from the gel andsequenced.

CBH and β-Glucosidase Optimization

As noted above, the exemplary enzyme SEQ ID NO:34 (encoded by, e.g., SEQID NO:33) was “evolved” using GSSM technology, and a library of mutantswas cloned into E. coli. 100 out of 491 mutants (altered sites) havebeen transformed into Cochliobolus.

Enzyme amounts can be determined in each well of a 96-well plate. Afteractivity measurements, the values can be normalized to protein contentto correct for expression and growth variations in the plate. ELISAbased analysis of enzyme activity can be used, and optimized. An exampleof SEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) grown in a 96-wellplate and enzyme activity assayed on the substrate PASC is shown in FIG.114. The corresponding ELISA data is also plotted on the same figure.FIG. 114 illustrates functional (yellow, or right bar) and quantitation(blue, or left is ⅕ CMX; and, red, or middle bar, is 1/20 PBS) data froma 96 well plate of wild type SEQ ID NO:34 (encoded by, e.g., SEQ IDNO:33) expressed in Cochliobolus. The functional assay was on PASC andthe quantitation was done using a specific polyclonal antibody towardsSEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33). The conclusion is thatthe quantitation and function data are not yet correlating.

Though the ELISA assay did not appear to correlate with enzyme activityWestern blots were more consistent (i.e., did correlate with enzymeactivity), as illustrated in FIG. 115. FIG. 115 is an illustration ofWestern blots of specific wells from FIG. 114, as discussed above (whereSEQ ID NO:34 (encoded by, e.g., SEQ ID NO:33) was grown in a microtiterwell plate and enzyme activity assayed on the substrate PASC). The lanesare labeled with position as well as H (high), M (medium) and L (low)for activity in the well.

Multi-channel capillary electrophoresis can be used to screen the GSSMlibrary. CE has several advantages over the reducing sugar and glucoseoxidase coupled reactions. The one potential disadvantage is throughput,however the MEGABACE™ (Global Medical Instrumentation, Inc. Ramsey,Minn.) instrument has 96 different capillaries in parallel and can beset up to perform medium throughput runs. Shown in FIG. 116 is anexample of standards of glucose, cellobiose and maltotetraose in 48 ofthe channels. Since there is capillary to capillary variation the datamust be further processed to align the standards and generate standardcurves, however it is apparent that this method can be useful forscreening a GSSM library. FIG. 116 is an illustration of unalignedelectrophoretograms from 48 channels from a 96 channel MegaBACE™instrument. Various concentrations of glucose (G1), cellobiose (G2) andmaltotraose (G4) were labeled with APTS and separated on the MegaBACE™.

A GSSM library also was constructed of the β-glucosidase SEQ ID NO:94(ENCODED BY SEQ ID NO:93); i.e., the exemplary enzyme SEQ ID NO:94(ENCODED BY SEQ ID NO:93) was “evolved” using GSSM technology, and alibrary of expression vectors comprising the various mutations was made.

Overexpression of Genes

Cochliobolus were chemically mutagenized, screened for expression, andmutated strains with increased protein production were identified. Asnoted above, two secreted enzymes—xylanase and general cellulase werestudied in this system. These data indicated that 11 of the original 74strains appeared to secrete more β-glucosidase activity than the wildtype strain (mean activity>3SD from the wild type). These experimentswere repeated and the same results seen, as illustrated in FIG. 117.FIG. 117 shows data reconfirming the high protein expression of topcandidate “over-expressing” Cochliobolus strains. Strains were grown in1% Jaygo 2 for 4 days and culture broths assayed for hydrolysis of4-MUB-cellobioside. Activity was corrected for dry cell weight in theculture. The lane designated “C5” is the wild type control.

In addition SDS-PAGE gels were run on the secreted protein, asillustrated in FIG. 118. The SDS-PAGE results suggest that there is anoverall increase in total protein secreted from each strain. FIG. 118 isan illustration of an SDS-PAGE of the secreted protein of the top 10“over-expressing” Cochliobolus strains. 15 ul of culture broth wasloaded into each lane. The numbers along the bottom are the straindesignation (see FIG. 117 for the corresponding activity measurement).The lane designated “C5” is the wild type control.

In one aspect, enzymes of the invention are expressed in these“over-expressing” Cochliobolus strains; e.g., in strains engineered toexpress the exemplary enzymes of the invention CBHs SEQ ID NO:34(ENCODED BY SEQ ID NO:33) and SEQ ID NO:98 (ENCODED BY SEQ ID NO:97). Inanother aspect, enzymes of the invention are expressed in an Aspergillusmodel—10 CBH genes were cloned and transformed into Aspergillus.Expression was tested in 8 of the new strains and 7 show very clearprotein bands on SDS-PAGE. Examples of ten different transformants ofthe exemplary enzymes SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) and SEQ IDNO:34 (ENCODED BY SEQ ID NO:33) are shown in FIG. 119. FIG. 119illustrates an SDS-PAGE of the secreted proteins of 10 individualtransformants of SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) and SEQ ID NO:34(ENCODED BY SEQ ID NO:33) in Aspergillus. Untransformed host is shown inthe lane labeled “wt”.

A comparison between the production of SEQ ID NO:98 (ENCODED BY SEQ IDNO:97) and SEQ ID NO:34 (ENCODED BY SEQ ID NO:33) in Cochliobolus andAspergillus is shown in FIG. 120. Clearly the Aspergillus system isproducing more protein. In addition the molecular weights for theproduced protein from each sample look similar. FIG. 120 illustrates anSDS-PAGE of Aspergillus- and Cochliobolus-produced SEQ ID NO:34 (ENCODEDBY SEQ ID NO:33) and SEQ ID NO:98 (ENCODED BY SEQ ID NO:97) enzymes.Lane 1: molecular weight standards, Lane 2: SEQ ID NO:98 (ENCODED BY SEQID NO:97)—Cokie (15 uL), Lane 3: SEQ ID NO:98 (ENCODED BY SEQ IDNO:97)—Aspergillus (5 uL), Lane 4: SEQ ID NO:34 (ENCODED BY SEQ IDNO:33)—Cokie (15 uL), Lane 5: SEQ ID NO:34 (ENCODED BY SEQ IDNO:33)—Aspergillus (5 uL). The activity of these, or any Aspergillus-and Cochliobolus-expressed enzymes of the invention (including testingthe enzyme mixes or “cocktails” of the invention), also can be testedusing the substrate PASC.

Example 13 Exemplary Enzyme Cocktails of the Invention

This example describes exemplary enzyme cocktails of the invention. Inone embodiment, a “cocktail” of the invention comprises SEQ ID NO:46(encoded, e.g., by SEQ ID NO:45) as the CBH I (or, alternatively, theCBH I can be SEQ ID NO:34, encoded, e.g., by SEQ ID NO:33) and SEQ IDNO:524 (encoded, e.g., by SEQ ID NO:523) as the xylanase (or,alternatively, the xylanase can be SEQ ID NO:100, encoded, e.g., by SEQID NO:99).

In alternative embodiments, loadings for CBH 1, e.g., SEQ ID NO:46and/or SEQ ID NO:34, can either be 2.5 mg/g cellulose or 5 mg/gcellulose, or anywhere in the range of between 0.05 and 10.0 mg/gcellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0; 1.5; 2; 2.1; 2.2;2.3; 2.4; 2.5 etc. including all values to about 10.0 mg/g cellulose, ormore.

In alternative embodiments, loadings for the xylanase, e.g., SEQ IDNO:524 and/or SEQ ID NO:100, can be 0.2 or 0.6 mg/g cellulose, oranywhere in the range of between 0.05 and 10.0 mg/g cellulose, e.g.,0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0; 1.5; 2; 2.1; 2.2; 2.3; 2.4; 2.5 etc.including all values to about 10.0 mg/g cellulose, or more. Equivalentxylose conversion numbers can be attained using either 0.2 or 0.6 mg SEQID NO:524 (encoded, e.g., by SEQ ID NO:523)/g cellulose. In one aspect,a major benefit of this exemplary cocktail is improved xyloseconversion.

In alternative embodiments, loadings for endoglucanase, e.g., SEQ IDNO:106, can be 1.7 mg/g cellulose or anywhere in the range of between0.05 and 10.0 mg/g cellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0;1.5; 2; 2.1; 2.2; 2.3; 2.4; 2.5 etc. including all values to about 10.0mg/g cellulose, or more.

In alternative embodiments, loadings for oligomerase-1, e.g., SEQ IDNO:522, can be 0.5 mg/g cellulose or anywhere in the range of between0.05 and 10.0 mg/g cellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0;1.5; 2; 2.1; 2.2; 2.3; 2.4; 2.5 etc. including all values to about 10.0mg/g cellulose, or more.

In alternative embodiments, loadings for CBH 2, e.g., SEQ ID NO:98, canbe 1.0 mg/g cellulose or anywhere in the range of between 0.05 and 10.0mg/g cellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0; 1.5; 2; 2.1;2.2; 2.3; 2.4; 2.5 etc. including all values to about 10.0 mg/gcellulose, or more.

In alternative embodiments, loadings for arabinofuranosidase, e.g., SEQID NO:92, can be 0.25 mg/g cellulose or anywhere in the range of between0.05 and 10.0 mg/g cellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0;1.5; 2; 2.1; 2.2; 2.3; 2.4; 2.5 etc. including all values to about 10.0mg/g cellulose, or more.

In alternative embodiments, loadings for xylanase, e.g., SEQ ID NO:102,can be 0.15 mg/g cellulose or anywhere in the range of between 0.05 and10.0 mg/g cellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0; 1.5; 2;2.1; 2.2; 2.3; 2.4; 2.5 etc. including all values to about 10.0 mg/gcellulose, or more.

In alternative embodiments, loadings for oligomerase-2, e.g., SEQ IDNO:520, can be 1.0 mg/g cellulose or anywhere in the range of between0.05 and 10.0 mg/g cellulose, e.g., 0.05, 0.1; 0.2; 0.3; 0.4; 0.5; 1.0;1.5; 2; 2.1; 2.2; 2.3; 2.4; 2.5 etc. including all values to about 10.0mg/g cellulose, or more.

Exemplary Cocktails of the Invention Include:

E8* Cocktail Loading: Pure enzyme, mg/g Enzyme Family/class/typecellulose SEQ ID NO: 106 Endoglucanase 1.7 (encoded, e.g., by SEQ ID NO:105) SEQ ID NO: 522 Oligomerase-1 0.5 (encoded, e.g., by (β-glucosidase)SEQ ID NO: 521) SEQ ID NO: 46 Family 7 - CBH1 5 (encoded, e.g., by 2.5SEQ ID NO: 45) SEQ ID NO: 98 Family 6 - CBH2 1 (encoded, e.g., by SEQ IDNO: 97) SEQ ID NO: 92 Arabinofuranosidase 0.25 (encoded, e.g., by SEQ IDNO: 91) SEQ ID NO: 102 Family 10 - xylanase 0.15 (encoded, e.g., by SEQID NO: 101) SEQ ID NO: 520 Oligomerase-2 1 (encoded, e.g., by(β-xylosidase) SEQ ID NO: 519) SEQ ID NO: 524 Xylanase 0.1 (encoded,e.g., by 0.2 SEQ ID NO: 523) 0.3 0.6 Total loading Approx. 10 mg/g

FIG. 123 graphically illustrates data showing the percent xylanconversion over time for the exemplary “E*” cocktails of the inventionnoted immediately above comprising: 5 mg SEQ ID NO:46 and 0.2 mg SEQ IDNO:524; 2.5 mg SEQ ID NO:46 and 0.6 mg SEQ ID NO:524; 5 mg SEQ ID NO:46and 0.6 mg SEQ ID NO:524.

A number of aspects of the invention have been described. Nevertheless,it will be understood that various modifications may be made withoutdeparting from the spirit and scope of the invention. Accordingly, otheraspects are within the scope of the following claims.

1. An isolated, synthetic or recombinant nucleic acid comprising (a) anucleic acid sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%,57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%,71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%,85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,99%, or more or complete sequence identity to SEQ ID NO:1, SEQ ID NO:3,SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ IDNO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ IDNO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ IDNO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55, SEQ IDNO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ IDNO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ IDNO:77, SEQ ID NO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ IDNO:87, SEQ ID NO:89, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:95, SEQ IDNO:97, SEQ ID NO:99, SEQ ID NO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ IDNO:107, SEQ ID NO:109, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:115, SEQID NO:117, SEQ ID NO:119, SEQ ID NO:121, SEQ ID NO:123, SEQ ID NO:125,SEQ ID NO:127, SEQ ID NO:129, SEQ ID NO:131, SEQ ID NO:133, SEQ IDNO:135, SEQ ID NO:137, SEQ ID NO:139, SEQ ID NO:141, SEQ ID NO:143, SEQID NO:145, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153,SEQ ID NO:155, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:161, SEQ IDNO:163, SEQ ID NO:165, SEQ ID NO:167, SEQ ID NO:169, SEQ ID NO:171, SEQID NO:173, SEQ ID NO:175, SEQ ID NO:177, SEQ ID NO:179, SEQ ID NO:181,SEQ ID NO:183, SEQ ID NO:185, SEQ ID NO:187, SEQ ID NO:189, SEQ IDNO:191, SEQ ID NO:193, SEQ ID NO:195, SEQ ID NO:197, SEQ ID NO:199, SEQID NO:201, SEQ ID NO:203, SEQ ID NO:205, SEQ ID NO:207, SEQ ID NO:209,SEQ ID NO:211, SEQ ID NO:213, SEQ ID NO:215, SEQ ID NO:217, SEQ IDNO:219, SEQ ID NO:221, SEQ ID NO:223, SEQ ID NO:225, SEQ ID NO:227, SEQID NO:229, SEQ ID NO:231, SEQ ID NO:233, SEQ ID NO:235, SEQ ID NO:237,SEQ ID NO:239, SEQ ID NO:241, SEQ ID NO:243, SEQ ID NO:245, SEQ IDNO:247, SEQ ID NO:249, SEQ ID NO:251, SEQ ID NO:253, SEQ ID NO:255, SEQID NO:257, SEQ ID NO:259, SEQ ID NO:261, SEQ ID NO:263, SEQ ID NO:265,SEQ ID NO:267, SEQ ID NO:269, SEQ ID NO:271, SEQ ID NO:273, SEQ IDNO:275, SEQ ID NO:277, SEQ ID NO:279, SEQ ID NO:281, SEQ ID NO:283, SEQID NO:285, SEQ ID NO:287, SEQ ID NO:289, SEQ ID NO:291, SEQ ID NO:293,SEQ ID NO:295, SEQ ID NO:297, SEQ ID NO:299, SEQ ID NO:301, SEQ IDNO:303, SEQ ID NO:305, SEQ ID NO:307, SEQ ID NO:309, SEQ ID NO:311, SEQID NO:313, SEQ ID NO:315, SEQ ID NO:317, SEQ ID NO:319, SEQ ID NO:321,SEQ ID NO:323, SEQ ID NO:325, SEQ ID NO:327, SEQ ID NO:329, SEQ IDNO:331, SEQ ID NO:333, SEQ ID NO:335, SEQ ID NO:337, SEQ ID NO:339, SEQID NO:341, SEQ ID NO:343, SEQ ID NO:345, SEQ ID NO:347, SEQ ID NO:349,SEQ ID NO:351, SEQ ID NO:353, SEQ ID NO:355, SEQ ID NO:357, SEQ IDNO:359, SEQ ID NO:361, SEQ ID NO:363, SEQ ID NO:365, SEQ ID NO:367, SEQID NO:369, SEQ ID NO:371, SEQ ID NO:373, SEQ ID NO:375, SEQ ID NO:377,SEQ ID NO:379, SEQ ID NO:381, SEQ ID NO:383, SEQ ID NO:385, SEQ IDNO:387, SEQ ID NO:389, SEQ ID NO:391, SEQ ID NO:393, SEQ ID NO:395, SEQID NO:397, SEQ ID NO:399, SEQ ID NO:401, SEQ ID NO:403, SEQ ID NO:405,SEQ ID NO:407, SEQ ID NO:409, SEQ ID NO:411, SEQ ID NO:413, SEQ IDNO:415, SEQ ID NO:417, SEQ ID NO:419, SEQ ID NO:421, SEQ ID NO:423, SEQID NO:425, SEQ ID NO:427, SEQ ID NO:429, SEQ ID NO:431, SEQ ID NO:433,SEQ ID NO:435, SEQ ID NO:437, SEQ ID NO:439, SEQ ID NO:441, SEQ IDNO:443, SEQ ID NO:445, SEQ ID NO:447, SEQ ID NO:449, SEQ ID NO:451, SEQID NO:453, SEQ ID NO:455, SEQ ID NO:457, SEQ ID NO:459, SEQ ID NO:461,SEQ ID NO:463, SEQ ID NO:465, SEQ ID NO:467, SEQ ID NO:469, SEQ IDNO:471, SEQ ID NO:473, SEQ ID NO:475, SEQ ID NO:477, SEQ ID NO:479, SEQID NO:481, SEQ ID NO:483, SEQ ID NO:485, SEQ ID NO:487, SEQ ID NO:489,SEQ ID NO:491, SEQ ID NO:493, SEQ ID NO:495, SEQ ID NO:497, SEQ IDNO:499, SEQ ID NO:501, SEQ ID NO:503, SEQ ID NO:505, SEQ ID NO:507, SEQID NO:509, SEQ ID NO:511, SEQ ID NO:513, SEQ ID NO:515, SEQ ID NO:517,SEQ ID NO:521 or SEQ ID NO:523, over a region of at least about 20, 30,40, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650,700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150 or more residues,wherein the nucleic acid encodes a polypeptide having an oligomerase, acellulase, a cellulolytic activity, an endoglucanase, acellobiohydrolase, a beta-glucosidase, a xylanase, a mannanse, aβ-xylosidase or an arabinofuranosidase activity, (b) a nucleic acidsequence that hybridizes under stringent conditions to a nucleic acidcomprising SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ IDNO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ IDNO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ IDNO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ IDNO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ IDNO:49, SEQ ID NO:51, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ IDNO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ IDNO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ IDNO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ IDNO:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ IDNO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ ID NO:107, SEQ ID NO:109, SEQID NO:111, SEQ ID NO:113, SEQ ID NO:115, SEQ ID NO:117, SEQ ID NO:119,SEQ ID NO:121, SEQ ID NO:123, SEQ ID NO:125, SEQ ID NO:127, SEQ IDNO:129, SEQ ID NO:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:137, SEQID NO:139, SEQ ID NO:141, SEQ ID NO:143, SEQ ID NO:145, SEQ ID NO:147,SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153, SEQ ID NO:155, SEQ IDNO:157, SEQ ID NO:159, SEQ ID NO:161, SEQ ID NO:163, SEQ ID NO:165, SEQID NO:167, SEQ ID NO:169, SEQ ID NO:171, SEQ ID NO:173, SEQ ID NO:175,SEQ ID NO:177, SEQ ID NO:179, SEQ ID NO:181, SEQ ID NO:183, SEQ IDNO:185, SEQ ID NO:187, SEQ ID NO:189, SEQ ID NO:191, SEQ ID NO:193, SEQID NO:195, SEQ ID NO:197, SEQ ID NO:199, SEQ ID NO:201, SEQ ID NO:203,SEQ ID NO:205, SEQ ID NO:207, SEQ ID NO:209, SEQ ID NO:211, SEQ IDNO:213, SEQ ID NO:215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID NO:221, SEQID NO:223, SEQ ID NO:225, SEQ ID NO:227, SEQ ID NO:229, SEQ ID NO:231,SEQ ID NO:233, SEQ ID NO:235, SEQ ID NO:237, SEQ ID NO:239, SEQ IDNO:241, SEQ ID NO:243, SEQ ID NO:245, SEQ ID NO:247, SEQ ID NO:249, SEQID NO:251, SEQ ID NO:253, SEQ ID NO:255, SEQ ID NO:257, SEQ ID NO:259,SEQ ID NO:261, SEQ ID NO:263, SEQ ID NO:265, SEQ ID NO:267, SEQ IDNO:269, SEQ ID NO:271, SEQ ID NO:273, SEQ ID NO:275, SEQ ID NO:277, SEQID NO:279, SEQ ID NO:281, SEQ ID NO:283, SEQ ID NO:285, SEQ ID NO:287,SEQ ID NO:289, SEQ ID NO:291, SEQ ID NO:293, SEQ ID NO:295, SEQ IDNO:297, SEQ ID NO:299, SEQ ID NO:301, SEQ ID NO:303, SEQ ID NO:305, SEQID NO:307, SEQ ID NO:309, SEQ ID NO:311, SEQ ID NO:313, SEQ ID NO:315,SEQ ID NO:317, SEQ ID NO:319, SEQ ID NO:321, SEQ ID NO:323, SEQ IDNO:325, SEQ ID NO:327, SEQ ID NO:329, SEQ ID NO:331, SEQ ID NO:333, SEQID NO:335, SEQ ID NO:337, SEQ ID NO:339, SEQ ID NO:341, SEQ ID NO:343,SEQ ID NO:345, SEQ ID NO:347, SEQ ID NO:349, SEQ ID NO:351, SEQ IDNO:353, SEQ ID NO:355, SEQ ID NO:357, SEQ ID NO:359, SEQ ID NO:361, SEQID NO:363, SEQ ID NO:365, SEQ ID NO:367, SEQ ID NO:369, SEQ ID NO:371,SEQ ID NO:373, SEQ ID NO:375, SEQ ID NO:377, SEQ ID NO:379, SEQ IDNO:381, SEQ ID NO:383, SEQ ID NO:385, SEQ ID NO:387, SEQ ID NO:389, SEQID NO:391, SEQ ID NO:393, SEQ ID NO:395, SEQ ID NO:397, SEQ ID NO:399,SEQ ID NO:401, SEQ ID NO:403, SEQ ID NO:405, SEQ ID NO:407, SEQ IDNO:409, SEQ ID NO:411, SEQ ID NO:413, SEQ ID NO:415, SEQ ID NO:417, SEQID NO:419, SEQ ID NO:421, SEQ ID NO:423, SEQ ID NO:425, SEQ ID NO:427,SEQ ID NO:429, SEQ ID NO:431, SEQ ID NO:433, SEQ ID NO:435, SEQ IDNO:437, SEQ ID NO:439, SEQ ID NO:441, SEQ ID NO:443, SEQ ID NO:445, SEQID NO:447, SEQ ID NO:449, SEQ ID NO:451, SEQ ID NO:453, SEQ ID NO:455,SEQ ID NO:457, SEQ ID NO:459, SEQ ID NO:461, SEQ ID NO:463, SEQ IDNO:465, SEQ ID NO:467, SEQ ID NO:469, SEQ ID NO:471, SEQ ID NO:473, SEQID NO:475, SEQ ID NO:477, SEQ ID NO:479, SEQ ID NO:481, SEQ ID NO:483,SEQ ID NO:485, SEQ ID NO:487, SEQ ID NO:489, SEQ ID NO:491, SEQ IDNO:493, SEQ ID NO:495, SEQ ID NO:497, SEQ ID NO:499, SEQ ID NO:501, SEQID NO:503, SEQ ID NO:505, SEQ ID NO:507, SEQ ID NO:509, SEQ ID NO:511,SEQ ID NO:513, SEQ ID NO:515, SEQ ID NO:517, SEQ ID NO:521 or SEQ IDNO:523, wherein the nucleic acid encodes a polypeptide having anoligomerase, a cellulase, a cellulolytic activity, an endoglucanase, acellobiohydrolase, a beta-glucosidase, a xylanase, a mannanse, aβ-xylosidase or an arabinofuranosidase activity, and the stringentconditions include a wash step comprising a wash in 0.2×SSC at atemperature of about 65° C. for about 15 minutes, and optionally thenucleic acid is at least about 20, 30, 40, 50, 60, 75, 100, 150, 200,300, 400, 500, 600, 700, 800, 900, 1000 or more residues in length orthe full length of the gene or transcript; (c) a nucleic acid sequenceencoding a polypeptide having the sequence of SEQ ID NO:2, SEQ ID NO:4,SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ IDNO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ IDNO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ IDNO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:56, SEQ IDNO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ IDNO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ IDNO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ IDNO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ IDNO:98, SEQ ID NO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ ID NO:106, SEQID NO:108, SEQ ID NO:110, SEQ ID NO:112, SEQ ID NO:114, SEQ ID NO:116,SEQ ID NO:118, SEQ ID NO:120, SEQ ID NO:122, SEQ ID NO:124, SEQ IDNO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQID NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQ ID NO:143,SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ IDNO:154, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:160, SEQ ID NO:162, SEQID NO:164, SEQ ID NO:166, SEQ ID NO:168, SEQ ID NO:170, SEQ ID NO:172,SEQ ID NO:174, SEQ ID NO:176, SEQ ID NO:178, SEQ ID NO:180, SEQ IDNO:182, SEQ ID NO:184, SEQ ID NO:186, SEQ ID NO:188, SEQ ID NO:190, SEQID NO:192, SEQ ID NO:194, SEQ ID NO:196, SEQ ID NO:198, SEQ ID NO:200,SEQ ID NO:202, SEQ ID NO:204, SEQ ID NO:206, SEQ ID NO:209, SEQ IDNO:210, SEQ ID NO:212, SEQ ID NO:214, SEQ ID NO:216, SEQ ID NO:218, SEQID NO:220, SEQ ID NO:222, SEQ ID NO:224, SEQ ID NO:226, SEQ ID NO:228,SEQ ID NO:230, SEQ ID NO:232, SEQ ID NO:234, SEQ ID NO:236, SEQ IDNO:238, SEQ ID NO:240, SEQ ID NO:242, SEQ ID NO:244, SEQ ID NO:246, SEQID NO:248, SEQ ID NO:250, SEQ ID NO:252, SEQ ID NO:254, SEQ ID NO:256,SEQ ID NO:258, SEQ ID NO:260, SEQ ID NO:262, SEQ ID NO:264, SEQ IDNO:266, SEQ ID NO:268, SEQ ID NO:270, SEQ ID NO:272, SEQ ID NO:274, SEQID NO:276, SEQ ID NO:278, SEQ ID NO:280, SEQ ID NO:282, SEQ ID NO:284,SEQ ID NO:286, SEQ ID NO:288, SEQ ID NO:290, SEQ ID NO:292, SEQ IDNO:294, SEQ ID NO:296, SEQ ID NO:298, SEQ ID NO:300, SEQ ID NO:302, SEQID NO:304, SEQ ID NO:306, SEQ ID NO:308, SEQ ID NO:310, SEQ ID NO:312,SEQ ID NO:314, SEQ ID NO:316, SEQ ID NO:318, SEQ ID NO:320, SEQ IDNO:322, SEQ ID NO:324, SEQ ID NO:326, SEQ ID NO:328, SEQ ID NO:330, SEQID NO:332, SEQ ID NO:334, SEQ ID NO:336, SEQ ID NO:338, SEQ ID NO:340,SEQ ID NO:342, SEQ ID NO:344, SEQ ID NO:346, SEQ ID NO:348, SEQ IDNO:350, SEQ ID NO:352, SEQ ID NO:354, SEQ ID NO:356, SEQ ID NO:358, SEQID NO:360, SEQ ID NO:362, SEQ ID NO:364, SEQ ID NO:366, SEQ ID NO:368,SEQ ID NO:370, SEQ ID NO:372, SEQ ID NO:374, SEQ ID NO:376, SEQ IDNO:378, SEQ ID NO:380, SEQ ID NO:382, SEQ ID NO:384, SEQ ID NO:386, SEQID NO:388, SEQ ID NO:390, SEQ ID NO:392, SEQ ID NO:394, SEQ ID NO:396,SEQ ID NO:398, SEQ ID NO:400, SEQ ID NO:402, SEQ ID NO:404, SEQ IDNO:406, SEQ ID NO:408, SEQ ID NO:410, SEQ ID NO:412, SEQ ID NO:414, SEQID NO:416, SEQ ID NO:418, SEQ ID NO:420, SEQ ID NO:422, SEQ ID NO:424,SEQ ID NO:426, SEQ ID NO:428, SEQ ID NO:430, SEQ ID NO:432, SEQ IDNO:434, SEQ ID NO:436, SEQ ID NO:438, SEQ ID NO:440, SEQ ID NO:442, SEQID NO:444, SEQ ID NO:446, SEQ ID NO:448, SEQ ID NO:450, SEQ ID NO:452,SEQ ID NO:454, SEQ ID NO:456, SEQ ID NO:458, SEQ ID NO:460, SEQ IDNO:462, SEQ ID NO:464, SEQ ID NO:466, SEQ ID NO:468, SEQ ID NO:470, SEQID NO:472, SEQ ID NO:474, SEQ ID NO:476, SEQ ID NO:478, SEQ ID NO:480,SEQ ID NO:482, SEQ ID NO:484, SEQ ID NO:486, SEQ ID NO:488, SEQ IDNO:490, SEQ ID NO:492, SEQ ID NO:494, SEQ ID NO:496, SEQ ID NO:498, SEQID NO:500, SEQ ID NO:502, SEQ ID NO:504, SEQ ID NO:506, SEQ ID NO:508,SEQ ID NO:510, SEQ ID NO:512, SEQ ID NO:514, SEQ ID NO:516, SEQ IDNO:518, or SEQ ID NO:524; (d) the nucleic acid of (a), (b) or (c) andencoding a polypeptide having at least one amino acid conservativesubstitution and retaining its oligomerase, cellulase, cellulolyticactivity, endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase,mannanse, β-xylosidase or arabinofuranosidase activity; (e) the nucleicacid of (a), (b), (c) or (d) encoding a polypeptide having anoligomerase, a cellulase, a cellulolytic activity, an endoglucanase, acellobiohydrolase, a beta-glucosidase, a xylanase, a mannanse, aβ-xylosidase or an arabinofuranosidase activity but lacking a signalsequence or a carbohydrate binding module; (f) the nucleic acid of (a),(b), (c), (d) or (e) encoding a polypeptide having an oligomerase, acellulase, a cellulolytic activity, an endoglucanase, acellobiohydrolase, a beta-glucosidase, a xylanase, a mannanse, aβ-xylosidase or an arabinofuranosidase activity but having aheterologous sequence, wherein optionally the heterologous sequencecomprises a heterologous signal sequence, carbohydrate binding module,catalytic domain (CD), or a combination thereof, and optionally theheterologous signal sequence, carbohydrate binding module or catalyticdomain (CD) is derived from another oligomerase, cellulase orcellulolytic enzyme, or a non-oligomerase, cellulase or cellulolyticenzyme; or (g) a nucleic acid sequence fully complementary to (a), (b),(c), (d), (e) or (f). 2-3. (canceled)
 4. The isolated, synthetic orrecombinant nucleic acid of claim 1, wherein the cellulase activitycomprises an endocellulase activity, an endoglucanase activity, acellobiohydrolase activity, an β-glucosidase, a mannanase activity, orany combination thereof. 5-8. (canceled)
 9. The isolated, synthetic orrecombinant nucleic acid of claim 1, wherein the cellulase oroligomerase activity comprises catalyzing hydrolysis of1,4-beta-D-glycosidic linkages; glucanase linkages; β-1,4- and/orβ-1,3-glucanase linkages; endo-glucanase linkages;endo-1,4-beta-D-glucan 4-glucano hydrolase activity; internalendo-β-1,4-glucanase linkages; β-1,3-glucanase linkages; internalβ-1,3-glucosidic linkages; glucopyranoses; and/or 1,4-β-glycoside-linkedD-glucopyranoses. 10-26. (canceled)
 27. The isolated, synthetic orrecombinant nucleic acid of claim 1, wherein the cellulase oroligomerase activity is thermostable or thermotolerant. 28-47.(canceled)
 48. An isolated, synthetic or recombinant polypeptide (i)having an amino acid sequence having at least 50%, 51%, 52%, 53%, 54%,55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%,69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99%, or more, or 100% sequence identity to SEQ ID NO:2, SEQ IDNO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ IDNO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ IDNO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ IDNO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ IDNO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ IDNO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ IDNO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ IDNO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ IDNO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ IDNO:96, SEQ ID NO:98, SEQ ID NO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ IDNO:106, SEQ ID NO:108, SEQ ID NO:110, SEQ ID NO:112, SEQ ID NO:114, SEQID NO:116, SEQ ID NO:118, SEQ ID NO:120, SEQ ID NO:122, SEQ ID NO:124,SEQ ID NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ IDNO:134, SEQ ID NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQID NO:143, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152,SEQ ID NO:154, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:160, SEQ IDNO:162, SEQ ID NO:164, SEQ ID NO:166, SEQ ID NO:168, SEQ ID NO:170, SEQID NO:172, SEQ ID NO:174, SEQ ID NO:176, SEQ ID NO:178, SEQ ID NO:180,SEQ ID NO:182, SEQ ID NO:184, SEQ ID NO:186, SEQ ID NO:188, SEQ IDNO:190, SEQ ID NO:192, SEQ ID NO:194, SEQ ID NO:196, SEQ ID NO:198, SEQID NO:200, SEQ ID NO:202, SEQ ID NO:204, SEQ ID NO:206, SEQ ID NO:209,SEQ ID NO:210, SEQ ID NO:212, SEQ ID NO:214, SEQ ID NO:216, SEQ IDNO:218, SEQ ID NO:220, SEQ ID NO:222, SEQ ID NO:224, SEQ ID NO:226, SEQID NO:228, SEQ ID NO:230, SEQ ID NO:232, SEQ ID NO:234, SEQ ID NO:236,SEQ ID NO:238, SEQ ID NO:240, SEQ ID NO:242, SEQ ID NO:244, SEQ IDNO:246, SEQ ID NO:248, SEQ ID NO:250, SEQ ID NO:252, SEQ ID NO:254, SEQID NO:256, SEQ ID NO:258, SEQ ID NO:260, SEQ ID NO:262, SEQ ID NO:264,SEQ ID NO:266, SEQ ID NO:268, SEQ ID NO:270, SEQ ID NO:272, SEQ IDNO:274, SEQ ID NO:276, SEQ ID NO:278, SEQ ID NO:280, SEQ ID NO:282, SEQID NO:284, SEQ ID NO:286, SEQ ID NO:288, SEQ ID NO:290, SEQ ID NO:292,SEQ ID NO:294, SEQ ID NO:296, SEQ ID NO:298, SEQ ID NO:300, SEQ IDNO:302, SEQ ID NO:304, SEQ ID NO:306, SEQ ID NO:308, SEQ ID NO:310, SEQID NO:312, SEQ ID NO:314, SEQ ID NO:316, SEQ ID NO:318, SEQ ID NO:320,SEQ ID NO:322, SEQ ID NO:324, SEQ ID NO:326, SEQ ID NO:328, SEQ IDNO:330, SEQ ID NO:332, SEQ ID NO:334, SEQ ID NO:336, SEQ ID NO:338, SEQID NO:340, SEQ ID NO:342, SEQ ID NO:344, SEQ ID NO:346, SEQ ID NO:348,SEQ ID NO:350, SEQ ID NO:352, SEQ ID NO:354, SEQ ID NO:356, SEQ IDNO:358, SEQ ID NO:360, SEQ ID NO:362, SEQ ID NO:364, SEQ ID NO:366, SEQID NO:368, SEQ ID NO:370, SEQ ID NO:372, SEQ ID NO:374, SEQ ID NO:376,SEQ ID NO:378, SEQ ID NO:380, SEQ ID NO:382, SEQ ID NO:384, SEQ IDNO:386, SEQ ID NO:388, SEQ ID NO:390, SEQ ID NO:392, SEQ ID NO:394, SEQID NO:396, SEQ ID NO:398, SEQ ID NO:400, SEQ ID NO:402, SEQ ID NO:404,SEQ ID NO:406, SEQ ID NO:408, SEQ ID NO:410, SEQ ID NO:412, SEQ IDNO:414, SEQ ID NO:416, SEQ ID NO:418, SEQ ID NO:420, SEQ ID NO:422, SEQID NO:424, SEQ ID NO:426, SEQ ID NO:428, SEQ ID NO:430, SEQ ID NO:432,SEQ ID NO:434, SEQ ID NO:436, SEQ ID NO:438, SEQ ID NO:440, SEQ IDNO:442, SEQ ID NO:444, SEQ ID NO:446, SEQ ID NO:448, SEQ ID NO:450, SEQID NO:452, SEQ ID NO:454, SEQ ID NO:456, SEQ ID NO:458, SEQ ID NO:460,SEQ ID NO:462, SEQ ID NO:464, SEQ ID NO:466, SEQ ID NO:468, SEQ IDNO:470, SEQ ID NO:472, SEQ ID NO:474, SEQ ID NO:476, SEQ ID NO:478, SEQID NO:480, SEQ ID NO:482, SEQ ID NO:484, SEQ ID NO:486, SEQ ID NO:488,SEQ ID NO:490, SEQ ID NO:492, SEQ ID NO:494, SEQ ID NO:496, SEQ IDNO:498, SEQ ID NO:500, SEQ ID NO:502, SEQ ID NO:504, SEQ ID NO:506, SEQID NO:508, SEQ ID NO:510, SEQ ID NO:512, SEQ ID NO:514, SEQ ID NO:516,SEQ ID NO:518, or SEQ ID NO:524, over a region of at least about 20, 25,30, 35, 40, 45, 50, 55, 60, 75, 100, 150, 200, 250, 300 or moreresidues, (ii) having an amino acid sequence encoded by a nucleic acidof claim 1, wherein the polypeptide has an oligomerase, a cellulase or acellulolytic activity or has immunogenic activity in that it is capableof generating an antibody that specifically binds to a polypeptidehaving the sequence of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ IDNO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ IDNO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ IDNO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ IDNO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ IDNO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:56, SEQ ID NO:58, SEQ IDNO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ IDNO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ IDNO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ IDNO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ IDNO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ ID NO:106, SEQ ID NO:108, SEQID NO:110, SEQ ID NO:112, SEQ ID NO:114, SEQ ID NO:116, SEQ ID NO:118,SEQ ID NO:120, SEQ ID NO:122, SEQ ID NO:124, SEQ ID NO:126, SEQ IDNO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:136, SEQID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQ ID NO:143, SEQ ID NO:146,SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ ID NO:154, SEQ IDNO:156, SEQ ID NO:158, SEQ ID NO:160, SEQ ID NO:162, SEQ ID NO:164, SEQID NO:166, SEQ ID NO:168, SEQ ID NO:170, SEQ ID NO:172, SEQ ID NO:174,SEQ ID NO:176, SEQ ID NO:178, SEQ ID NO:180, SEQ ID NO:182, SEQ IDNO:184, SEQ ID NO:186, SEQ ID NO:188, SEQ ID NO:190, SEQ ID NO:192, SEQID NO:194, SEQ ID NO:196, SEQ ID NO:198, SEQ ID NO:200, SEQ ID NO:202,SEQ ID NO:204, SEQ ID NO:206, SEQ ID NO:209, SEQ ID NO:210, SEQ IDNO:212, SEQ ID NO:214, SEQ ID NO:216, SEQ ID NO:218, SEQ ID NO:220, SEQID NO:222, SEQ ID NO:224, SEQ ID NO:226, SEQ ID NO:228, SEQ ID NO:230,SEQ ID NO:232, SEQ ID NO:234, SEQ ID NO:236, SEQ ID NO:238, SEQ IDNO:240, SEQ ID NO:242, SEQ ID NO:244, SEQ ID NO:246, SEQ ID NO:248, SEQID NO:250, SEQ ID NO:252, SEQ ID NO:254, SEQ ID NO:256, SEQ ID NO:258,SEQ ID NO:260, SEQ ID NO:262, SEQ ID NO:264, SEQ ID NO:266, SEQ IDNO:268, SEQ ID NO:270, SEQ ID NO:272, SEQ ID NO:274, SEQ ID NO:276, SEQID NO:278, SEQ ID NO:280, SEQ ID NO:282, SEQ ID NO:284, SEQ ID NO:286,SEQ ID NO:288, SEQ ID NO:290, SEQ ID NO:292, SEQ ID NO:294, SEQ IDNO:296, SEQ ID NO:298, SEQ ID NO:300, SEQ ID NO:302, SEQ ID NO:304, SEQID NO:306, SEQ ID NO:308, SEQ ID NO:310, SEQ ID NO:312, SEQ ID NO:314,SEQ ID NO:316, SEQ ID NO:318, SEQ ID NO:320, SEQ ID NO:322, SEQ IDNO:324, SEQ ID NO:326, SEQ ID NO:328, SEQ ID NO:330, SEQ ID NO:332, SEQID NO:334, SEQ ID NO:336, SEQ ID NO:338, SEQ ID NO:340, SEQ ID NO:342,SEQ ID NO:344, SEQ ID NO:346, SEQ ID NO:348, SEQ ID NO:350, SEQ IDNO:352, SEQ ID NO:354, SEQ ID NO:356, SEQ ID NO:358, SEQ ID NO:360, SEQID NO:362, SEQ ID NO:364, SEQ ID NO:366, SEQ ID NO:368, SEQ ID NO:370,SEQ ID NO:372, SEQ ID NO:374, SEQ ID NO:376, SEQ ID NO:378, SEQ IDNO:380, SEQ ID NO:382, SEQ ID NO:384, SEQ ID NO:386, SEQ ID NO:388, SEQID NO:390, SEQ ID NO:392, SEQ ID NO:394, SEQ ID NO:396, SEQ ID NO:398,SEQ ID NO:400, SEQ ID NO:402, SEQ ID NO:404, SEQ ID NO:406, SEQ IDNO:408, SEQ ID NO:410, SEQ ID NO:412, SEQ ID NO:414, SEQ ID NO:416, SEQID NO:418, SEQ ID NO:420, SEQ ID NO:422, SEQ ID NO:424, SEQ ID NO:426,SEQ ID NO:428, SEQ ID NO:430, SEQ ID NO:432, SEQ ID NO:434, SEQ IDNO:436, SEQ ID NO:438, SEQ ID NO:440, SEQ ID NO:442, SEQ ID NO:444, SEQID NO:446, SEQ ID NO:448, SEQ ID NO:450, SEQ ID NO:452, SEQ ID NO:454,SEQ ID NO:456, SEQ ID NO:458, SEQ ID NO:460, SEQ ID NO:462, SEQ IDNO:464, SEQ ID NO:466, SEQ ID NO:468, SEQ ID NO:470, SEQ ID NO:472, SEQID NO:474, SEQ ID NO:476, SEQ ID NO:478, SEQ ID NO:480, SEQ ID NO:482,SEQ ID NO:484, SEQ ID NO:486, SEQ ID NO:488, SEQ ID NO:490, SEQ IDNO:492, SEQ ID NO:494, SEQ ID NO:496, SEQ ID NO:498, SEQ ID NO:500, SEQID NO:502, SEQ ID NO:504, SEQ ID NO:506, SEQ ID NO:508, SEQ ID NO:510,SEQ ID NO:512, SEQ ID NO:514, SEQ ID NO:516, SEQ ID NO:518, or SEQ IDNO:524; (iii) having an amino acid sequence as set forth in (i) or (ii),or a polypeptide encoded by a nucleic acid of claim 1, and comprising atleast one amino acid residue conservative substitution, whereinoptionally conservative substitution comprises replacement of analiphatic amino acid with another aliphatic amino acid; replacement of aserine with a threonine or vice versa; replacement of an acidic residuewith another acidic residue; replacement of a residue bearing an amidegroup with another residue bearing an amide group; exchange of a basicresidue with another basic residue; or, replacement of an aromaticresidue with another aromatic residue, or a combination thereof, andoptionally the aliphatic residue comprises Alanine, Valine, Leucine,Isoleucine or a synthetic equivalent thereof; the acidic residuecomprises Aspartic acid, Glutamic acid or a synthetic equivalentthereof; the residue comprising an amide group comprises Aspartic acid,Glutamic acid or a synthetic equivalent thereof; the basic residuecomprises Lysine, Arginine or a synthetic equivalent thereof; or, thearomatic residue comprises Phenylalanine, Tyrosine or a syntheticequivalent thereof; (iv) the polypeptide of (i), (ii) or (iii) having anoligomerase, a cellulase or a cellulolytic activity but lacking a signalsequence or a carbohydrate binding module; (e) the polypeptide of (i),(ii), (iii) or (iv) having an oligomerase, a cellulase or a cellulolyticactivity but having a heterologous sequence, wherein optionally theheterologous sequence comprises a heterologous signal sequence,carbohydrate binding module, catalytic domain (CD), or a combinationthereof, and optionally the heterologous signal sequence, carbohydratebinding module or catalytic domain (CD) is derived from anotheroligomerase, cellulase or cellulolytic enzyme, or a non-oligomerase,cellulase or cellulolytic enzyme.
 49. The isolated, synthetic orrecombinant polypeptide of claim 48, wherein the cellulase activitycomprises an endocellulase activity, an endoglucanase activity, acellobiohydrolase activity, an β-glucosidase, a mannanase activity, orany combination thereof.
 50. The isolated, synthetic or recombinantpolypeptide of claim 48, wherein the oligomerase activity compriseshydrolyzing (degrading) soluble oligomers to fermentable, monomericsugars; hydrolyzing (degrading) soluble cellooligsaccharides andarabinoxylan oligomers into monomers, and optionally the monomerscomprise xylose, arabinose and glucose; hydrolyzing (degrading) plantbiomass polysaccharides; or hydrolyzing a glucan to produce a smallermolecular weight polysaccharide or oligomer. 51-53. (canceled)
 54. Theisolated, synthetic or recombinant polypeptide of claim 48, wherein thecellulase and/or oligomerase activity comprises catalyzing hydrolysis of1,4-beta-D-glycosidic linkages; hydrolysis of a 1,4-beta-D-glycosidiclinkage in a cellulose, a cellulose derivative, a lichenin or a cereal;glucanase linkages; β-1,4-glucanase linkages; β1,3 glucanase linkages;of endo-glucanase linkages; endo-1,4-beta-D-glucan 4-glucano hydrolaseactivity; of internal endo-β-1,4-glucanase linkages; β-1,3-glucanaselinkages; internal β-1,3-glucosidic linkages; and/or1,4-β-glycoside-linked D-glucopyranoses. 55-66. (canceled)
 67. Theisolated, synthetic or recombinant polypeptide of claim 48, wherein thecellulase and/or oligomerase activity comprises hydrolyzing a cellulose,a cellulose derivative or a hemicellulose.
 68. The isolated, syntheticor recombinant polypeptide of claim 67, wherein the cellulase and/oroligomerase activity comprises hydrolyzing a cellulose or ahemicellulose in a wood, paper pulp, wood product or paper product, aplant biomass, wherein optionally the plant biomass comprises seeds,grains, tubers, plant wastes or byproducts of food processing orindustrial processing, stalks, corn, cobs, stover, grasses, whereinoptionally grasses are Indian grass or switch grass.
 69. The isolated,synthetic or recombinant polypeptide of claim 48, wherein the cellulaseand/or oligomerase activity comprises catalyzing hydrolysis of glucan ina feed, a food product or a beverage.
 70. The isolated, synthetic orrecombinant polypeptide of claim 69, wherein the feed, food product orbeverage comprises a cereal-based animal feed, a wort or a beer, adough, a fruit or a vegetable.
 71. The isolated, synthetic orrecombinant polypeptide of claim 48, wherein the cellulase activitycomprises catalyzing hydrolysis of a glucan in a microbial cell, afungal cell, a mammalian cell, a plant cell or any plant materialcomprising a cellulosic part.
 72. The isolated, synthetic or recombinantpolypeptide of claim 48, wherein the cellulase and/or oligomeraseactivity is thermostable or thermotolerant. 73-76. (canceled)
 77. Anisolated, synthetic or recombinant polypeptide comprising a polypeptideas set forth in claim 48 and having a heterologous signal or leadersequence or a heterologous prepro sequence. 78-79. (canceled)
 80. Theisolated, synthetic or recombinant polypeptide of claim 48, wherein thepolypeptide comprises at least one glycosylation site, and optionallythe glycosylation is an N-linked glycosylation, and optionally thepolypeptide is glycosylated after being expressed in a P. pastoris or aS. pombe. 81-143. (canceled)
 144. A method for making a fuel comprisingcontacting a composition comprising a cellulose or a fermentable sugarwith a polypeptide as set forth in claim 48, or a polypeptide encoded bya nucleic acid as set forth in claim 1, wherein optionally thecomposition comprising a cellulose or a fermentable sugar comprises aplant, plant product or plant derivative, and optionally the plant orplant product comprises cane sugar plants or plant products, beets orsugarbeets, wheat, corn, soybeans, potato, rice or barley, andoptionally the polypeptide has activity comprising cellulase,endoglucanase, cellobiohydrolase, beta-glucosidase, xylanase, mannanse,β-xylosidase, arabinofuranosidase, and/or oligomerase activity, whereinoptionally the oligomerase is an oligomerase-1 (a β-glucosidase) or anoligomerase-2 (a β-xylosidase), and optionally the fuel comprises abioethanol or a gasoline-ethanol mix. 145-163. (canceled)